Fuzzed lifecycle tests work slightly differently to their handwritten
counterparts. A handwritten test will typically "start from nothing". An
initial snapshot is built from an empty state and starting program,
before subsequent operations are executed on this state to test
behaviour. In constrast, fuzzed tests create arbitrary starting
snapshots "out of thin air", before running an operation to see if a bug
can be triggered. Ideally, any starting state conjured by a fuzz test is
actually reproducible from an empty state and some combination of
operations, but it may be that this is not the case, or that the number
of operations required to reach the state is very high. In such cases,
it is handy to have the exact code the fuzz test used to hand when
reproducing and isolating behaviour. To this end, this commit extends
the `reprogen` functionality of the suite to generate this code as well
as the existing "handwritten" approximation. This should also aid in
minimising failing test cases quickly when bugs are found.
In #17623 through #17627 and some follow-up PRs, we built out a
framework for fuzzing lifecycle tests in order to help track down
snapshot integrity violations in the Pulumi engine. All that remains now
is to actually provide ways to trigger a fuzzing run in useful ways.
This commit kicks this off by introducing two Go test functions that can
be run with `go test` or our `Makefile`:
* `TestFuzz` -- this runs the fuzzer and generates a brand new set of
scenarios (1,000 by default) and checks whether any of them result in a
snapshot integrity error. This test is skipped unless an environment
variable is set (which the `Makefile` handles if one runs `make
test_lifecycle_fuzz`). The intended purpose of this test is to back one
or more CI workflows that will run periodically in order to slowly
explore the state space.
* `TestFuzzFromStateFile` -- this accepts a path to a JSON state file
(such as that produced by a `pulumi stack export`) and uses that state
to seed the fuzzer, subsequently trying to find provider and operation
configurations that lead to a snapshot integrity error. This test is
skipped unless a state file path is set using the relevant environment
variable. The intended purpose of this test is to make it possible to
find root causes for user issues when all we have is a state and we'd
like to guess the program/provider configurations that led to an issue.
Alongside introducing these two tests, we bulk out the fuzzing
documentation a bit to help engineers run them, and link to the new
sections from the existing docs on snapshot integrity issues.
It's often the case that we want to move or rename resources in our
Pulumi programs. Doing so can result in a change in the resource's URN,
which means that Pulumi will, by default, see the move as a deletion of
the old resource and a creation of the new resource. We can tell Pulumi
that a resource has been renamed by using *aliases*, whereby a resource
can be annotated with its previous URNs. Pulumi will then use these URNs
in several places:
* When looking for a resource's previous state, Pulumi will try to find
it using the new URN first and any aliases second.
* When writing out a new snapshot, Pulumi will *normalize* all URNs
(e.g. those in provider, parent and dependency references) to remove
aliases and replace them with the new URNs they resolve to.
Alas, a familiar story presents itself in the second case -- Pulumi does
not currently normalize `DeletedWith` references. This can result in
snapshot integrity errors as Pulumi leaves stale references in the
snapshot before writing it. This commit addresses this omission, using
the now-preferred `GetAllDependencies` method introduced in #17320 to
hopefully stop this from happening again in this part of the codebase.
Fixes#17614
In #17623 and the PRs that followed it, we added fuzz testing
capabilities to the engine's lifecycle test suite, with a view to
randomly generating test cases in the hopes of proactively finding
snapshot integrity bugs in our code. This commit extends the fuzzing
library to generate a reproducing lifecycle test case in the event that
a snapshot integrity error is found, hopefully aiding in debugging and
pinning down the exact cause of the error. Code is written to a file in
a temporary directory, which may optionally be overridden using an
environment variable. This might be useful in e.g. a GitHub action that
fuzzes periodically so that failing cases' reproductions can be made
available as artifacts for download.
Part of #17213
In #17623 and the PRs that followed it, we added fuzz testing
capabilities to the engine's lifecycle test suite, with a view to
randomly generating test cases in the hopes of proactively finding
snapshot integrity bugs in our code. This commit extends the fuzzer so
that it randomly parents, re-parents, and aliases reparented resources,
covering the various parent/child relationships that these actions lead
to and which can result in snapshot integrity issues if handled
improperly. In particular, we can now fuzz the following:
* Randomly parenting a resource to another resource in an initial
snapshot.
* Randomly updating a resource in a program to either add, remove, or
change an existing parent.
* Randomly aliasing a resource whose parent (and consequently URN) has
been changed by a program to point back to the URN it had originally.
Part of #17213
Snapshot integrity errors are very problematic when they occur and can
be hard to spot and prevent. To this end, #17213 outlines a plan to
introduce [fuzzing](https://en.wikipedia.org/wiki/Fuzzing) to our suite
of lifecycle tests in order to find cases and executions which might
violate snapshot integrity. This commit extends the `fuzzing` package of
the suite to support generating random fixtures and adds documentation
for the tactics employed when doing so.
A fixture comprises an initial snapshot, a program to run against that
initial snapshot, a set of providers to use during that program's
execution, and an operation to run. Fixtures can then be generated as
part of a Rapid property test, allowing us to fuzz random combinations
in order to hunt down snapshot integrity issues.
Part of #17213
Snapshot integrity errors are very problematic when they occur and can
be hard to spot and prevent. To this end, #17213 outlines a plan to
introduce [fuzzing](https://en.wikipedia.org/wiki/Fuzzing) to our suite
of lifecycle tests in order to find cases and executions which might
violate snapshot integrity. This commit extends the `fuzzing` package of
the suite to support generating random providers. A provider may
configure operations such as `Create`, `Diff`, etc. to fail or succeed
in a number of ways (e.g. no diff, changes, replaces, etc.) on a per-URN
basis, so as to exercise a variety of code paths in step
generation/execution.
Part of #17213
Snapshot integrity errors are very problematic when they occur and can
be hard to spot and prevent. To this end, #17213 outlines a plan to
introduce [fuzzing](https://en.wikipedia.org/wiki/Fuzzing) to our suite
of lifecycle tests in order to find cases and executions which might
violate snapshot integrity. This commit extends the `fuzzing` package of
the suite to support generating random programs. A program is based upon
a snapshot and may randomly append and prepend new resources and copy,
drop or update existing ones.
Part of #17213
Snapshot integrity errors are very problematic when they occur and can
be hard to spot and prevent. To this end, #17213 outlines a plan to
introduce [fuzzing](https://en.wikipedia.org/wiki/Fuzzing) to our suite
of lifecycle tests in order to find cases and executions which might
violate snapshot integrity. This commit extends the `fuzzing` package of
the suite to support generating random valid snapshots, the idea being
that this will enable us to generate good starting states for our
property tests. A snapshot contains a number of random resources which
may depend on one another in valid ways. Snapshots may also contain
deleted resources which have not yet been cleaned from the snapshot.
Part of #17213
Snapshot integrity errors are very problematic when they occur and can
be hard to spot and prevent. To this end, #17213 outlines a plan to
introduce [fuzzing](https://en.wikipedia.org/wiki/Fuzzing) to our suite
of lifecycle tests in order to find cases and executions which might
violate snapshot integrity. This commit kicks this off by introducing a
`fuzzing` package to the suite and adding types and generators (from the
`pgregory.net/rapid` library) for generating random resources. The idea
is that from resources we can progress to generating random snapshots;
from there to programs and provider configurations and so on; and with
all these pieces execute random tests in an attempt to find snapshot
integrity bugs before our customers do.
Part of #17213