Commit Graph

2 Commits

Author SHA1 Message Date
Fraser Waters 49d8298e84
Fix snapshot integrity on pending replacement ()
This fixes a snapshot integrity issue with delete before replace, failed
creates, and multiple updates.

This was caused by https://github.com/pulumi/pulumi/pull/16510 where we
started removing pending replace resources as part of their replacement
steps. A well intentioned fix to stop trying to delete resources we'd
already deleted in a previous update.

The bug would manifest if the following happened:
1. An update goes to replace a resource with `deleteBeforeCreate` set.
It deletes the old resource (and saves the state as pending replace) but
then failed to create the new replacement resource.
2. A second update is run to try and create the replacement resource. At
this point the bug manifest and we delete the pending replace resource
from state, but then if the new resource fails to create again we end up
with an invalid snapshot.

The fix is very simple. If a resource is already pending replacement we
just don't issue a delete/remove step at all. The pending replace
resource will get cleaned up at the end by the create step creating the
new version.

Fixes 
Fixes 
Fixes 
Fixes 
Fixes 
2024-09-04 10:52:43 +00:00
Will Jones 3f27ee9688
Don't re-delete resources that are `PendingReplacement` ()
As well as indicating that a resource's state has changes, a diff can
also indicate that those changes require the _replacement_ of the
resource, meaning that it must be recreated and not just updated. In
this scenario, there are two possible ways to replace the resource -- by
first creating another new resource before deleting the old one
("create-before-replace"), or by first deleting the old resource before
creating its replacement ("delete-before-replace").
Create-before-replace is the default since generally, if possible to
implement, it should result in fewer instances of "downtime", where a
desired resource does not exist in the system.

Should delete-before-replace be chosen, Pulumi implements this under the
hood as three steps: delete for replacement, replace, and create
replacement. To track things consistently, as well as enable resumption
of an interrupted operation, Pulumi writes a flag, `PendingReplacement`
to the state of a deleted resource that will later be cleaned up by a
completed replacement.

Should an interrupted operation be resumed, Pulumi does not currently
take `PendingReplacement` into account, and always enqueues a(nother)
delete operation. This is typically fine (albeit wasteful) since deletes
are (should) be idempotent, but unnecessary. This commit adds
@jesse-triplewhale's fix for this behaviour whereby the
`PendingReplacement` flag is simply removed before the remainder of the
required steps (replace, create replacement) are actioned as normal. It
also extends this work with some lifecycle tests for this scenario and a
few others that may arise as a result of an interrupted replacement.

Fixes 
Closes 

Co-authored-by: Jesse Grodman <jesse@triplewhale.com>
2024-06-28 23:16:20 +00:00