Preserve static IP across tear-down/stand-up cycles (#56)#69
Preserve static IP across tear-down/stand-up cycles (#56)#69
Conversation
WalkthroughThe PR implements static IP management in Terraform for GCP VMs and refactors deployment orchestration. It introduces a Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant StandUp as Stand-Up Script
participant KDM as KamalDeploymentManager
participant TF as Terraform
participant GCP as GCP/Compute Engine
participant DNS as DNS Resolver<br/>(Google 8.8.8.8)
participant Local as Local DNS Cache
User->>StandUp: ./stand-up
StandUp->>KDM: KamalDeploymentManager.new.deploy
Note over KDM,TF: Terraform Apply Phase
KDM->>KDM: import_static_ip_if_exists
KDM->>TF: Check existing state for google_compute_address
KDM->>TF: terraform apply (attach static IP)
TF->>GCP: Allocate/attach google_compute_address
GCP-->>KDM: VM deployed with static IP
Note over KDM,DNS: DNS Verification Phase
KDM->>DNS: Resolve configured domain
DNS-->>KDM: Returns IP address
KDM->>KDM: Verify IP matches deployed IP
KDM->>KDM: update_config sets `@ip_changed` flag
Note over KDM,Local: Cleanup & Guidance Phase
KDM->>Local: flush_local_dns_cache (OS-specific)
Local->>Local: macOS: dscacheutil -flushcache
Local->>Local: Linux: systemd-resolve flush
KDM->>KDM: print_ssl_note (if SSL enabled)
KDM-->>User: Deployment complete
rect rgba(100, 150, 200, 0.5)
Note over User,Local: Optional: Tear-Down with --keep-ip
User->>KDM: ./tear-down --keep-ip
KDM->>TF: Remove google_compute_address.rails_app_ip from state
KDM->>TF: terraform destroy (preserves static IP)
TF->>GCP: Destroy VM (IP remains reserved)
end
sequenceDiagram
actor User
participant TearDown as Tear-Down Script
participant KDM as KamalDeploymentManager
participant Kamal as Kamal Orchestrator
participant TF as Terraform
participant GCP as GCP/Compute Engine
User->>TearDown: ./tear-down [--keep-ip]
Note over TearDown: Parse CLI flags
alt --keep-ip flag provided
TearDown->>TF: Remove google_compute_address.rails_app_ip from state
TF->>TF: State updated (IP reservation preserved)
else No --keep-ip flag
Note over TF: Proceed normally
end
TearDown->>Kamal: Stop Kamal containers
Kamal->>GCP: Terminate running app containers
TearDown->>TF: terraform destroy
TF->>GCP: Destroy compute resources
GCP-->>TF: Destruction complete
alt --keep-ip was set
Note over GCP: Static IP remains reserved<br/>(costs ~$0.01/hr unattached)
else --keep-ip not set
GCP-->>User: All resources released
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@lib/deployment_manager.rb`:
- Around line 190-193: In method resolve_dns update the array literal assigned
to args so it has spaces inside the brackets to satisfy
Layout/SpaceInsideArrayLiteralBrackets; specifically change the args assignment
(the variable args in resolve_dns) from the compact form to the spaced form
(i.e., add a space after '[' and before ']'), leaving the subsequent conditional
append and the Open3.capture2 call unchanged.
🧹 Nitpick comments (4)
terraform-gcloud/bin/tear-down (1)
36-36: Debug statement left in — consider removing.This
echoprints the internal script directory on every run, which isn't useful to end users and looks like a leftover from development.Proposed fix
-echo "$SCRIPT_DIR is the script directory"README.md (1)
307-310: Consider mentioningresolvectlas the modern alternative tosystemd-resolve.The
systemd-resolvecommand is deprecated on newer systemd versions in favor ofresolvectl flush-caches. The Ruby code indeployment_manager.rbalready handles both, but the README only shows the older command.Proposed fix
**Linux (systemd):** ```bash -sudo systemd-resolve --flush-caches +sudo resolvectl flush-caches # or: sudo systemd-resolve --flush-caches</details> </blockquote></details> <details> <summary>lib/deployment_manager.rb (2)</summary><blockquote> `74-80`: **Suppressing stderr on `terraform import` hides actionable errors.** If the import fails for reasons other than "IP doesn't exist" (e.g., auth failure, API quota, wrong project), the user sees only "No existing static IP found" with no diagnostic info. Consider capturing stderr and logging it at a debug/verbose level, or at minimum not routing it to `File::NULL`. <details> <summary>Proposed fix</summary> ```diff - _, status = Open3.capture2("terraform", "import", "google_compute_address.rails_app_ip", import_id, - chdir: `@terraform_dir`, err: File::NULL) + _, err_output, status = Open3.capture3("terraform", "import", "google_compute_address.rails_app_ip", import_id, + chdir: `@terraform_dir`) if status.success? puts "✅ Imported existing static IP — DNS and deploy.yml remain stable" else - puts "ℹ️ No existing static IP found — Terraform will create a new one" + puts "ℹ️ No existing static IP found — Terraform will create a new one" + puts " (import stderr: #{err_output.strip})" unless err_output.strip.empty? end
173-188: Runningsudonon-interactively may stall the script if no password is cached.
system("sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder")will prompt for a password, potentially surprising users mid-deploy. Consider warning the user before invokingsudo, or wrapping it in a timeout.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
The tear-down script now removes the static IP from Terraform state before destroying, so the IP reservation persists in GCP. The stand-up script (DeploymentManager) tries to import an existing static IP before terraform apply, reusing the same address. This eliminates deploy.yml and DNS churn across destroy/apply cycles. Use --release-ip with tear-down to fully release the IP when done. Fixes #56 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Safer for tutorial users who may forget about lingering resources. The stand-up script already handles IP changes in deploy.yml and DNS verification, so the cost of getting a new IP is low. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- stand-up and tear-down now handle --help/-h flags - DeploymentManager runs terraform init before apply to avoid "backend initialization required" errors on fresh checkouts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The stand-up script uses dig for DNS verification, which bypasses the OS DNS cache. This means verification passes but the browser still resolves the old IP. Now flushes the macOS/Linux DNS cache automatically after DNS verification when the server IP has changed. Also adds a "Stale DNS Cache" troubleshooting section to the README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The DNS verification loop now queries 8.8.8.8 instead of the system's default resolver, which can cache stale records for minutes. Google DNS reflects propagated changes much faster. After 3 failed attempts, the script prints troubleshooting tips inline (check registrar, verify record name, try dig manually) so users don't have to leave the terminal to debug. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add troubleshooting for ERR_CERTIFICATE_TRANSPARENCY_REQUIRED which Chrome shows when Let's Encrypt CT logs haven't propagated yet. This only affects fresh deployments (kamal setup), not subsequent deploys. Adds SSL warning to stand-up script output, verification checklist note in README, and detailed debugging steps in troubleshooting guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename lib/deployment_manager.rb to lib/kamal_deployment_manager.rb to clarify that this class manages Kamal deployments specifically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Provides coding agents with project architecture, key file relationships, machine-managed file boundaries, commands, and deployment gotchas so they can work in this codebase without extensive exploration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f529702 to
644281e
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@lib/kamal_deployment_manager.rb`:
- Around line 174-189: The flush_local_dns_cache method uses system("sudo ...")
which can block for a password; modify flush_local_dns_cache to attempt
non-interactive sudo (use sudo -n or sudo -v check) when invoking the macOS and
Linux commands (the branches that call system with dscacheutil/killall and
systemd-resolve/resolvectl), detect failure/exit status indicating a password
prompt, and instead log/warn that DNS flush was skipped and advise the user to
run the commands manually or enable passwordless sudo; preserve the existing
RUBY_PLATFORM and file-existence checks and only change how the system calls are
invoked and handled so the deployment flow never blocks waiting for interactive
sudo.
- Around line 60-82: The import_static_ip_if_exists method hardcodes
"us-central1" when building import_id, causing silent failures if variables.tf
uses a different region; update the method to parse the region from variables.tf
(add or reuse a helper like parse_region_from_variables alongside
parse_project_id_from_variables), validate that the region is present, and
replace the hardcoded region in import_id with the parsed region (e.g.,
"projects/#{project_id}/regions/#{region}/addresses/rails-app-ip"); if parsing
fails, bail out the same way you do when project_id is missing.
In `@terraform-gcloud/bin/tear-down`:
- Around line 17-34: The flag parsing loop that sets KEEP_IP in the script
currently ignores unknown flags (for arg in "$@"; case ... esac) so typos like
--kepe-ip silently leave KEEP_IP=false; modify the case in the loop that checks
--keep-ip and --help|-h to add a catch-all (*) branch that prints the
usage/error and exits non‑zero; reference the existing KEEP_IP variable and the
for arg in "$@" / case $arg in construct and ensure the new branch echoes the
usage (same text as --help) and exits 1 to prevent accidental IP destruction.
🧹 Nitpick comments (3)
CLAUDE.md (1)
9-20: Add a language identifier to the fenced code block.Markdownlint flags this as
MD040. Since this is a text diagram rather than code, usetextas the language identifier.Proposed fix
-``` +```text terraform-gcloud/bin/stand-uplib/kamal_deployment_manager.rb (2)
74-76: Stderr suppression onterraform importhides actionable errors.Redirecting stderr to
File::NULLon the import call (line 76) swallows errors like auth failures or network issues, making debugging difficult. Consider capturing stderr and printing it when the import fails unexpectedly (i.e., distinguish "resource not found" from other errors).Proposed fix
- _, status = Open3.capture2("terraform", "import", "google_compute_address.rails_app_ip", import_id, - chdir: `@terraform_dir`, err: File::NULL) + _, err_output, status = Open3.capture3("terraform", "import", "google_compute_address.rails_app_ip", import_id, + chdir: `@terraform_dir`) if status.success? puts "✅ Imported existing static IP — DNS and deploy.yml remain stable" + elsif err_output.include?("Cannot import non-existent") || err_output.include?("not found") + puts "ℹ️ No existing static IP found — Terraform will create a new one" else - puts "ℹ️ No existing static IP found — Terraform will create a new one" + puts "⚠️ Static IP import failed unexpectedly:\n#{err_output.strip}" + puts "ℹ️ Continuing — Terraform will attempt to create a new one" end
84-89: Regex parsing ofvariables.tfis brittle but acceptable for a tutorial.The regex on line 88 assumes a specific HCL formatting (double-quoted default on the same block). It won't match
default = var.something, expressions, or single-quoted values. For this tutorial's controlledvariables.tf, this is fine, but worth a comment noting the assumption.
| def import_static_ip_if_exists | ||
| # If the IP is already in Terraform state, nothing to do. | ||
| _, status = Open3.capture2("terraform", "state", "show", "google_compute_address.rails_app_ip", | ||
| chdir: @terraform_dir, err: File::NULL) | ||
| if status.success? | ||
| puts "✅ Static IP already in Terraform state" | ||
| return | ||
| end | ||
|
|
||
| project_id = parse_project_id_from_variables | ||
| return unless project_id | ||
|
|
||
| # Try to import a static IP preserved from a previous tear-down. | ||
| puts "Checking for existing static IP to import..." | ||
| import_id = "projects/#{project_id}/regions/us-central1/addresses/rails-app-ip" | ||
| _, status = Open3.capture2("terraform", "import", "google_compute_address.rails_app_ip", import_id, | ||
| chdir: @terraform_dir, err: File::NULL) | ||
| if status.success? | ||
| puts "✅ Imported existing static IP — DNS and deploy.yml remain stable" | ||
| else | ||
| puts "ℹ️ No existing static IP found — Terraform will create a new one" | ||
| end | ||
| end |
There was a problem hiding this comment.
Hardcoded region us-central1 will silently fail if the user changes their GCP region.
Line 74 constructs the import ID with a hardcoded region. If the region in variables.tf differs, the import silently fails (prints "No existing static IP found") and Terraform creates a new IP, defeating the purpose of --keep-ip. Consider parsing the region from variables.tf the same way you parse project_id.
Proposed fix
def import_static_ip_if_exists
# If the IP is already in Terraform state, nothing to do.
_, status = Open3.capture2("terraform", "state", "show", "google_compute_address.rails_app_ip",
chdir: `@terraform_dir`, err: File::NULL)
if status.success?
puts "✅ Static IP already in Terraform state"
return
end
project_id = parse_project_id_from_variables
return unless project_id
+ region = parse_variable_default("region") || "us-central1"
+
# Try to import a static IP preserved from a previous tear-down.
puts "Checking for existing static IP to import..."
- import_id = "projects/#{project_id}/regions/us-central1/addresses/rails-app-ip"
+ import_id = "projects/#{project_id}/regions/#{region}/addresses/rails-app-ip"
_, status = Open3.capture2("terraform", "import", "google_compute_address.rails_app_ip", import_id,
chdir: `@terraform_dir`, err: File::NULL)And generalize the parser:
- def parse_project_id_from_variables
+ def parse_variable_default(name)
vars_file = File.join(`@terraform_dir`, "variables.tf")
return unless File.exist?(vars_file)
- File.read(vars_file)[/variable\s+"project_id".*?default\s*=\s*"([^"]+)"/m, 1]
+ File.read(vars_file)[/variable\s+"#{Regexp.escape(name)}".*?default\s*=\s*"([^"]+)"/m, 1]
end
+
+ def parse_project_id_from_variables
+ parse_variable_default("project_id")
+ end🤖 Prompt for AI Agents
In `@lib/kamal_deployment_manager.rb` around lines 60 - 82, The
import_static_ip_if_exists method hardcodes "us-central1" when building
import_id, causing silent failures if variables.tf uses a different region;
update the method to parse the region from variables.tf (add or reuse a helper
like parse_region_from_variables alongside parse_project_id_from_variables),
validate that the region is present, and replace the hardcoded region in
import_id with the parsed region (e.g.,
"projects/#{project_id}/regions/#{region}/addresses/rails-app-ip"); if parsing
fails, bail out the same way you do when project_id is missing.
| def flush_local_dns_cache | ||
| return unless @ip_changed | ||
|
|
||
| # dig (used by verify_dns) queries DNS servers directly, bypassing the OS cache. | ||
| # The browser and other apps use the OS cache, which may still have the old IP. | ||
| log_step "Flushing local DNS cache" | ||
| if RUBY_PLATFORM.include?("darwin") | ||
| puts "IP address changed — flushing macOS DNS cache..." | ||
| system("sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder") | ||
| elsif File.exist?("/usr/bin/systemd-resolve") || File.exist?("/usr/bin/resolvectl") | ||
| puts "IP address changed — flushing Linux DNS cache..." | ||
| system("sudo systemd-resolve --flush-caches 2>/dev/null || sudo resolvectl flush-caches 2>/dev/null") | ||
| else | ||
| puts "IP address changed — you may need to flush your local DNS cache manually." | ||
| end | ||
| end |
There was a problem hiding this comment.
sudo in flush_local_dns_cache will prompt for a password mid-automation.
The system("sudo ...") calls on lines 182 and 185 will block waiting for a password if the user doesn't have passwordless sudo configured. This interrupts the otherwise automated deployment flow. Consider warning the user or skipping with a note if sudo isn't cached.
Proposed fix — attempt non-interactively, skip if sudo not cached
def flush_local_dns_cache
return unless `@ip_changed`
log_step "Flushing local DNS cache"
if RUBY_PLATFORM.include?("darwin")
puts "IP address changed — flushing macOS DNS cache..."
- system("sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder")
+ unless system("sudo -n dscacheutil -flushcache && sudo -n killall -HUP mDNSResponder 2>/dev/null")
+ puts "⚠️ Could not flush DNS cache (sudo required). Run manually:"
+ puts " sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder"
+ end
elsif File.exist?("/usr/bin/systemd-resolve") || File.exist?("/usr/bin/resolvectl")
puts "IP address changed — flushing Linux DNS cache..."
- system("sudo systemd-resolve --flush-caches 2>/dev/null || sudo resolvectl flush-caches 2>/dev/null")
+ unless system("sudo -n systemd-resolve --flush-caches 2>/dev/null || sudo -n resolvectl flush-caches 2>/dev/null")
+ puts "⚠️ Could not flush DNS cache (sudo required). Run manually:"
+ puts " sudo systemd-resolve --flush-caches"
+ end
else
puts "IP address changed — you may need to flush your local DNS cache manually."
end
end🤖 Prompt for AI Agents
In `@lib/kamal_deployment_manager.rb` around lines 174 - 189, The
flush_local_dns_cache method uses system("sudo ...") which can block for a
password; modify flush_local_dns_cache to attempt non-interactive sudo (use sudo
-n or sudo -v check) when invoking the macOS and Linux commands (the branches
that call system with dscacheutil/killall and systemd-resolve/resolvectl),
detect failure/exit status indicating a password prompt, and instead log/warn
that DNS flush was skipped and advise the user to run the commands manually or
enable passwordless sudo; preserve the existing RUBY_PLATFORM and file-existence
checks and only change how the system calls are invoked and handled so the
deployment flow never blocks waiting for interactive sudo.
| KEEP_IP=false | ||
| for arg in "$@"; do | ||
| case $arg in | ||
| --keep-ip) | ||
| KEEP_IP=true | ||
| ;; | ||
| --help|-h) | ||
| echo "Usage: tear-down [--keep-ip]" | ||
| echo "" | ||
| echo "Stops the Kamal app and destroys all Terraform infrastructure." | ||
| echo "" | ||
| echo "Options:" | ||
| echo " --keep-ip Preserve the static IP (~\$0.01/hr while unattached)" | ||
| echo " so DNS and deploy.yml stay stable across cycles" | ||
| exit 0 | ||
| ;; | ||
| esac | ||
| done |
There was a problem hiding this comment.
Typos in --keep-ip silently destroy the IP — consider rejecting unknown flags.
If a user types tear-down --kepe-ip (typo), the flag is silently ignored and KEEP_IP stays false, causing the static IP to be destroyed. Adding a catch-all * case that aborts with a usage message would prevent this.
Proposed fix
for arg in "$@"; do
case $arg in
--keep-ip)
KEEP_IP=true
;;
--help|-h)
echo "Usage: tear-down [--keep-ip]"
echo ""
echo "Stops the Kamal app and destroys all Terraform infrastructure."
echo ""
echo "Options:"
echo " --keep-ip Preserve the static IP (~\$0.01/hr while unattached)"
echo " so DNS and deploy.yml stay stable across cycles"
exit 0
;;
+ *)
+ echo "Unknown option: $arg"
+ echo "Usage: tear-down [--keep-ip]"
+ exit 1
+ ;;
esac
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| KEEP_IP=false | |
| for arg in "$@"; do | |
| case $arg in | |
| --keep-ip) | |
| KEEP_IP=true | |
| ;; | |
| --help|-h) | |
| echo "Usage: tear-down [--keep-ip]" | |
| echo "" | |
| echo "Stops the Kamal app and destroys all Terraform infrastructure." | |
| echo "" | |
| echo "Options:" | |
| echo " --keep-ip Preserve the static IP (~\$0.01/hr while unattached)" | |
| echo " so DNS and deploy.yml stay stable across cycles" | |
| exit 0 | |
| ;; | |
| esac | |
| done | |
| KEEP_IP=false | |
| for arg in "$@"; do | |
| case $arg in | |
| --keep-ip) | |
| KEEP_IP=true | |
| ;; | |
| --help|-h) | |
| echo "Usage: tear-down [--keep-ip]" | |
| echo "" | |
| echo "Stops the Kamal app and destroys all Terraform infrastructure." | |
| echo "" | |
| echo "Options:" | |
| echo " --keep-ip Preserve the static IP (~\$0.01/hr while unattached)" | |
| echo " so DNS and deploy.yml stay stable across cycles" | |
| exit 0 | |
| ;; | |
| *) | |
| echo "Unknown option: $arg" | |
| echo "Usage: tear-down [--keep-ip]" | |
| exit 1 | |
| ;; | |
| esac | |
| done |
🤖 Prompt for AI Agents
In `@terraform-gcloud/bin/tear-down` around lines 17 - 34, The flag parsing loop
that sets KEEP_IP in the script currently ignores unknown flags (for arg in
"$@"; case ... esac) so typos like --kepe-ip silently leave KEEP_IP=false;
modify the case in the loop that checks --keep-ip and --help|-h to add a
catch-all (*) branch that prints the usage/error and exits non‑zero; reference
the existing KEEP_IP variable and the for arg in "$@" / case $arg in construct
and ensure the new branch echoes the usage (same text as --help) and exits 1 to
prevent accidental IP destruction.
Summary
terraform destroy, preserving the IP reservation in GCP across destroy/apply cyclesDeploymentManager) attempts to import an existing static IP beforeterraform apply, reusing the same address--release-ipflag to tear-down for when users are truly done and want to release the IPHow it works
tear-down(default): runsterraform state rm google_compute_address.rails_app_ipbefore destroy — the IP stays in GCPtear-down --release-ip: skips the state removal, so destroy releases the IP toostand-up: beforeterraform apply, checks if a static IP namedrails-app-ipexists in GCP and imports it into stateThis eliminates the most common source of noisy diffs (IP changes in
deploy.yml) and removes the need to update DNS after each destroy/apply cycle.Fixes #56
Test plan
tear-downand verify static IP is preserved in GCP (gcloud compute addresses list)stand-upand verify the preserved IP is imported and reuseddeploy.ymlIP doesn't change across the cycletear-down --release-ipand verify IP is fully releasedstand-upon fresh state (no existing IP) and verify a new IP is created🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
--keep-ipflag to preserve static IP across deployment cycles, maintaining DNS and configuration stabilityDocumentation
Improvements