Skip to content

Commit 1891ea8

Browse files
committed
fix: Complete PR #261 - Production-ready K8s/OpenShift deployment fixes
This commit addresses all critical and high-priority issues identified in PR #261 review comments for production-ready Kubernetes/OpenShift deployment. ## Critical Fixes (Must Fix) 1. **Remove hardcoded secrets from values.yaml** - Replaced all "changeme" placeholders with comprehensive documentation - Added guidance for 3 secure secret management methods: * Helm --set-string flags (CI/CD) * External secrets management (Vault, AWS Secrets Manager, etc.) * Kubernetes secret creation (dev only) - Documented all required and optional secret keys 2. **Add /health/ready endpoint** - Created new lightweight readiness probe endpoint at /api/health/ready - Optimized for fast response (checks only critical DB connection) - Separates readiness from comprehensive health checks - Prevents premature traffic routing to unhealthy pods 3. **Standardize health probe endpoints** - Fixed K8s deployment: /api/health and /api/health/ready - Fixed Helm deployment: /api/health and /api/health/ready - Added startup probe to Helm backend deployment - Standardized timing and failure thresholds across both deployments 4. **Fix page number return type handling** - Added try-except blocks to handle None values gracefully - Prevents TypeError when converting invalid page numbers to int - Added logging for debugging invalid page_no/page values ## High-Priority Enhancements 5. **Use immutable image tags** - Removed :latest tags from backend, frontend, and minio images - Updated values.yaml to require explicit version tags - Added documentation and examples for git SHA-based tagging - Changed pullPolicy to IfNotPresent for better performance 6. **Add resource quotas** - Created ResourceQuota manifests (K8s and Helm) - Set namespace-level limits: 20 CPU requests, 40Gi memory requests - Created LimitRange for pod/container defaults and maximums - Prevents cluster resource exhaustion and runaway pods 7. **Make Docling processing async** - Wrapped synchronous Docling converter.convert() in asyncio.to_thread() - Prevents blocking the async event loop during CPU-intensive AI processing - Improves API responsiveness during document processing 8. **Add NetworkPolicy resources** - Created comprehensive NetworkPolicy manifests for all components - Implements zero-trust networking with explicit allow rules - Isolates backend, frontend, databases, and storage layers - Restricts egress to only required services (DNS, DBs, external APIs) ## Files Changed ### Backend - backend/rag_solution/router/health_router.py: Added /health/ready endpoint - backend/rag_solution/data_ingestion/docling_processor.py: Async processing, page number fixes ### Kubernetes Manifests - deployment/k8s/base/deployments/backend.yaml: Fixed health probes, image tags - deployment/k8s/base/networkpolicy/*: Added 3 NetworkPolicy manifests - deployment/k8s/base/resourcequota/namespace-quota.yaml: Added ResourceQuota and LimitRange ### Helm Charts - deployment/helm/rag-modulo/values.yaml: Removed hardcoded secrets, added resource quotas, NetworkPolicy config - deployment/helm/rag-modulo/templates/backend-deployment.yaml: Fixed health probes, added startup probe - deployment/helm/rag-modulo/templates/resourcequota.yaml: New ResourceQuota template - deployment/helm/rag-modulo/templates/networkpolicy.yaml: New NetworkPolicy template ## Testing Recommendations 1. Test health endpoints: curl http://backend:8000/api/health/ready 2. Verify secret management with external secrets operator 3. Test pod startup with new startup probe timing 4. Validate NetworkPolicy rules don't block legitimate traffic 5. Confirm resource quotas prevent over-allocation Closes #261 review comments
1 parent 6e67345 commit 1891ea8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+3694
-4
lines changed

backend/rag_solution/data_ingestion/docling_processor.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
"""
77

88
# Standard library imports
9+
import asyncio
910
import logging
1011
import os
1112
import uuid
@@ -119,8 +120,10 @@ async def process(self, file_path: str, document_id: str) -> AsyncIterator[Docum
119120
if self.converter is None:
120121
raise ImportError("Docling DocumentConverter not available")
121122

122-
# Convert document using Docling
123-
result = self.converter.convert(file_path)
123+
# Convert document using Docling (run in thread pool to avoid blocking event loop)
124+
# Docling's AI models are CPU-intensive and can block the async event loop
125+
logger.debug("Running Docling conversion in thread pool for: %s", file_path)
126+
result = await asyncio.to_thread(self.converter.convert, file_path)
124127

125128
# Extract metadata
126129
metadata = self._extract_docling_metadata(result.document, file_path)
@@ -393,10 +396,16 @@ def _get_page_number(self, item: Any) -> int | None:
393396
# Try new API first (page_no), fallback to old API (page)
394397
page_no = getattr(item.prov[0], "page_no", None)
395398
if page_no is not None:
396-
return int(page_no)
399+
try:
400+
return int(page_no)
401+
except (ValueError, TypeError):
402+
logger.warning("Invalid page_no value: %s", page_no)
397403
page = getattr(item.prov[0], "page", None)
398404
if page is not None:
399-
return int(page)
405+
try:
406+
return int(page)
407+
except (ValueError, TypeError):
408+
logger.warning("Invalid page value: %s", page)
400409
return None
401410

402411
def _table_to_text(self, table_data: dict) -> str:

backend/rag_solution/router/health_router.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,45 @@ def health_check(
152152
raise HTTPException(status_code=503, detail=f"System unhealthy. Components: {', '.join(unhealthy_components)}")
153153

154154
return {"status": "healthy", "components": components}
155+
156+
157+
@router.get(
158+
"/health/ready",
159+
summary="Readiness probe",
160+
description="Lightweight readiness check for Kubernetes readiness probe",
161+
response_model=dict,
162+
responses={
163+
200: {"description": "Application is ready to serve traffic"},
164+
503: {"description": "Application is not ready"},
165+
},
166+
)
167+
def readiness_check(db: Annotated[Session, Depends(get_db)]) -> dict[str, Any]:
168+
"""
169+
Perform a lightweight readiness check for Kubernetes readiness probe.
170+
171+
This endpoint is optimized for fast response times and checks only
172+
critical dependencies required to serve traffic (database connection).
173+
Unlike /health, it doesn't check external services like vector DB or LLM providers.
174+
175+
Args:
176+
db: The database session.
177+
178+
Returns:
179+
dict: Readiness status
180+
181+
Raises:
182+
HTTPException: If the application is not ready to serve traffic
183+
"""
184+
# Check only critical database connection
185+
datastore_status = check_datastore(db)
186+
187+
if datastore_status["status"] == "unhealthy":
188+
raise HTTPException(
189+
status_code=503,
190+
detail=f"Application not ready: {datastore_status['message']}"
191+
)
192+
193+
return {
194+
"status": "ready",
195+
"message": "Application is ready to serve traffic"
196+
}

0 commit comments

Comments
 (0)