Fail mark-for-deployment when re-deploying same version without --wait-for-deployment#4307
Fail mark-for-deployment when re-deploying same version without --wait-for-deployment#4307cuza wants to merge 8 commits into
Conversation
| ) | ||
| print(deployment_version) | ||
| print("Continuing anyway.") | ||
| if not args.block: |
There was a problem hiding this comment.
i think we'd want to flip this, no?
args.block is True with --wait-for-deployment, and we want to wait until the deploy group is healthy in that case rather than forging ahead?
| f"what is set to be deployed in deploy group {deploy_group}:" | ||
| ) | ||
| print(f" {deployment_version}") | ||
| print("Checking if all instances are healthy before proceeding...") |
There was a problem hiding this comment.
i think we might also want to do something slightly different here - i think we probably want to then essentially pretend that we're doing a normal --wait-for-deployment bounce and poll until the deploy group is empty (and then timeout after whatever we have the usual timeout set to)
There was a problem hiding this comment.
i.e., we want to treat an unhealthy deploy group as if it was previously on another version and wait until the "new" (really the same version, we're just re-polling again) version is healthy before continuing
(and if --wait-for-deployment is not set, then don't do anything different: just yolo as usual)
…config for deployment validation
| instance_health = [ | ||
| check_if_instance_is_done( | ||
| service=service, | ||
| instance=instance_config.get_instance(), | ||
| cluster=cluster, | ||
| version=deployment_version, | ||
| instance_config=instance_config, | ||
| ) | ||
| for cluster, instance_configs in instance_configs_per_cluster.items() | ||
| for instance_config in instance_configs | ||
| ] | ||
| all_healthy = all(instance_health) | ||
| if all_healthy: | ||
| print( | ||
| "All instances are healthy at this version. " | ||
| "Safe to proceed to the next deploy group." | ||
| ) | ||
| return 0 | ||
| else: | ||
| print( | ||
| "Error: Not all instances are healthy for this version. " | ||
| "A previous deploy may have failed or timed out. " | ||
| "Not safe to proceed to the next deploy group." | ||
| ) | ||
| return 1 |
There was a problem hiding this comment.
might be worth essentially doing what a normal m-f-d does nad run this logic in a loop until all the instances are healthy (or i guess some percentage are healthy - i think we only require bounce_margin_factor % of instances to be healthy to proceed?) rather than doing the check once and exiting
i think doing the logic in a loop would probably also allow us to remove the special-casing here since if everything is healthy, we'd excit that loop immediately and if not, we'd keep rechecking
Prevent deployment when attempting to redeploy the same version without the
--wait-for-deploymentflag.