Skip to content

Conversation

@manavgup
Copy link
Owner

Fixes #206. This PR fixes a 502 Bad Gateway error that occurs when the backend container is restarted. The nginx configuration is updated to dynamically resolve the backend's IP address, preventing the use of a stale, cached IP.

@github-actions
Copy link
Contributor

🚀 Development Environment Options

This repository supports Dev Containers for a consistent development environment.

Option 1: GitHub Codespaces (Recommended)

Create a cloud-based development environment:

  1. Click the green Code button above
  2. Select the Codespaces tab
  3. Click Create codespace on fix/issue-206-nginx-502-error
  4. Wait 2-3 minutes for environment setup
  5. Start coding with all tools pre-configured!

Option 2: VS Code Dev Containers (Local)

Use Dev Containers on your local machine:

  1. Install Docker Desktop
  2. Install VS Code
  3. Install the Dev Containers extension
  4. Clone this PR branch locally
  5. Open in VS Code and click "Reopen in Container" when prompted

Option 3: Traditional Local Setup

Set up the development environment manually:

# Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout fix/issue-206-nginx-502-error

# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validate

Available Commands

Once in your development environment:

make help           # Show all available commands
make dev-validate   # Validate environment setup
make test-atomic    # Run atomic tests
make test-unit      # Run unit tests
make lint          # Run linting

Services Available

When running make dev-up:


This automated message helps reviewers quickly set up the development environment.

@github-actions
Copy link
Contributor

Code Review: Fix nginx 502 errors with dynamic backend resolution

Summary

This PR addresses issue #206 by implementing dynamic DNS resolution in nginx to prevent 502 Bad Gateway errors when the backend container is restarted. The solution uses Docker's internal DNS resolver to dynamically resolve the backend service name.


✅ Strengths

  1. Correct Root Cause Fix: The solution properly addresses nginx's DNS caching behavior by forcing dynamic resolution using a variable ($backend_host).

  2. Docker-Specific Solution: Uses 127.0.0.11 which is Docker's embedded DNS server - the right approach for Docker Compose environments.

  3. Reasonable Cache TTL: The valid=10s parameter balances between DNS query overhead and stale IP detection.

  4. Minimal Changes: Small, focused change that doesn't introduce unnecessary complexity.


⚠️ Issues & Recommendations

1. CRITICAL: Inconsistent Configuration 🔴

The /api/auth/callback location block (line 70) still uses the old static resolution pattern:

location /api/auth/callback {
    proxy_pass http://backend:8000/api/auth/callback$is_args$args;
    # ...
}

Impact: Authentication callbacks will still experience 502 errors after backend restarts.

Fix: Apply the same dynamic resolution pattern:

location /api/auth/callback {
    resolver 127.0.0.11 valid=10s;
    set $backend_host backend;
    proxy_pass http://$backend_host:8000/api/auth/callback$is_args$args;
    # ... rest of config
}

2. Consider Upstream Block for Better Performance 🟡

The current solution works but could be optimized using an upstream block:

upstream backend_upstream {
    # Let nginx handle service discovery
    server backend:8000;
    keepalive 32;
}

server {
    resolver 127.0.0.11 valid=10s ipv6=off;
    
    location /api/ {
        proxy_pass http://backend_upstream;
        # ... headers
    }
    
    location /api/auth/callback {
        proxy_pass http://backend_upstream/api/auth/callback$is_args$args;
        # ... headers
    }
}

Benefits:

  • Connection pooling (keepalive 32)
  • Single resolver directive
  • Easier to add load balancing later
  • Better connection reuse

3. Missing Error Handling 🟡

Add error handling for DNS resolution failures:

location /api/ {
    resolver 127.0.0.11 valid=10s;
    resolver_timeout 5s;  # Add timeout
    set $backend_host backend;
    
    proxy_pass http://$backend_host:8000;
    proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;
    # ...
}

4. Documentation Gap 🟢

Consider adding an inline comment explaining why dynamic resolution is needed:

# Dynamic DNS resolution prevents 502 errors when backend container restarts.
# Docker's embedded DNS (127.0.0.11) resolves 'backend' service name dynamically.
# The variable $backend_host forces nginx to re-resolve on each request cycle.
resolver 127.0.0.11 valid=10s;
set $backend_host backend;
proxy_pass http://$backend_host:8000;

5. Testing Considerations 🟢

No automated tests for this infrastructure change. Consider:

  • Adding integration test that restarts backend and verifies connectivity
  • Document manual testing procedure in PR description
  • Add to docs/testing/infrastructure.md if it exists

🔒 Security Review

✅ No security concerns:

  • 127.0.0.11 is Docker's internal DNS, not exposed externally
  • No new attack surface introduced
  • CSP headers remain intact
  • CORS configuration unchanged

📊 Performance Considerations

Current Approach:

  • DNS lookup on every request (cached for 10s)
  • Minimal overhead (~1-2ms per DNS query)
  • No connection pooling

Recommendation: Consider upstream block (see #2) for production workloads with high traffic.


📝 Suggested Action Items

Before Merge (High Priority):

  1. ✅ Apply same fix to /api/auth/callback location block
  2. ✅ Add resolver_timeout 5s
  3. ✅ Add inline documentation comments

Follow-up PRs (Optional):

  1. Consider upstream block refactoring for better performance
  2. Add integration test for backend restart scenario
  3. Document in infrastructure testing guide

🎯 Verdict

Conditional Approval - The fix is sound and addresses the root cause, but needs to be applied consistently to the /api/auth/callback location block to prevent authentication failures.

Once the callback location is fixed, this is ready to merge. Great work identifying and fixing this DNS caching issue! 👍


📚 References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backend container loses connection to frontend after restart causing 502 Bad Gateway

2 participants