Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses two issues that impacts users in circumstances where a switch zone is unavailable:
When resolving which switch slot is managed by which switch zone, we would continuously retry whenever there was a communication error. This causes RPWs to stop running / Sagas to get stuck whenever a switch zone becomes unavailable but still has entries in DNS.
In the
dpd_ensure
node of theinstance_start
saga, we were bailing out if we encountered any errors while notifying dpd of nat changes. This means that in the event of one of our switch zones being unavailable we would no longer allow users to start instances, which is probably a bit too strict.This PR makes the following adjustments to the switch slot resolution behavior:
This PR makes the following changes to the behavior to the NAT configuration steps for instance related sagas:
Result
from thenotify
function so that callers can decide if they want to make a subsequent decision based on the error.Related
#8206
#6896