Cnt scalability#316
Open
yl-nuwan wants to merge 45 commits into
Open
Conversation
…moDB support - Separate agent runner execution role (image pull, logs) from task role (SQS, DynamoDB access) - Add CloudWatch Logs policy to agent runner task role for container log writes - Add DynamoDB memory table access policy for agent runner task role - Update ECS task definition to use execution role for image operations - Add comprehensive docstring updates to akagentrunner.py for boto3 and Lambda message format handling - Update sqs_handler.py to handle both Lambda camelCase and boto3 PascalCase attribute keys - Add queue-mode-guide.md documentation for queue mode deployment and configuration
…ents
- Add ECSQueueRequestHandler class that bypasses ChatService and directly enqueues requests to SQS
- Implement sync mode (REST_SYNC) to wait for responses in DynamoDB Response Store
- Implement async mode (REST_ASYNC) to return request_id for polling
- Add GET /api/v1/chat/{session_id} endpoint for async response polling
- Update ECSRESTService to use queue-aware handler instead of default REST API
- Export ECSQueueRequestHandler from containerized module __init__.py
- Update example app_rest_service.py to demonstrate queue-based request handling
- Enables scalable ECS deployments with asynchronous agent execution and DynamoDB response storage
…submission - Change message_attributes dict parameter to request_id positional argument - Align with SQS API expectations for FIFO queue message attributes - Simplify queue message construction by using native request_id parameter - Maintains backward compatibility with existing queue processing logic
…ecycle - Add detailed lifecycle logging with stage markers ([AGENT START], [AGENT PROCESSING], [AGENT RESPONSE], [AGENT DONE]) to akagentrunner.py for improved traceability - Add structured logging for output processing pipeline ([OUTPUT START], [OUTPUT STORE], [OUTPUT DONE]) in akrestservice.py with request/session tracking - Generate unique request_id using uuid.uuid4() instead of falling back to session_id in ecs_queue_handler.py for proper request isolation - Add comprehensive logging at each stage of request lifecycle ([REQUEST START], [ENQUEUED], [WAITING], [RESPONSE FOUND], [WAIT START], [WAIT SUCCESS], [WAIT RETRY], [WAIT TIMEOUT]) in ecs_queue_handler.py - Add debug_response_store.py utility script for inspecting DynamoDB response store state during development - Include request_id, session_id, agent, and prompt preview in logs for better debugging and tracing across async/sync execution modes
…GET integration - Fix path parameter mapping from hardcoded sessionId to dynamic $request.path.sessionId - Add blank line for improved readability in resource configuration - Ensure API Gateway correctly forwards session ID from request path to backend service
…ant comments - Rename sqs.tf to queue.tf for clearer module scope - Remove SQS Queue Mode header comment (redundant with file purpose) - Remove IAM section header comment (redundant with resource names) - Remove CloudWatch Logs section comment (redundant with resource names) - Improve code clarity by reducing comment clutter while maintaining resource documentation
…ple README - Add comprehensive Queue Mode section to containerized module README with architecture diagram and configuration example - Document scalable queue mode use cases and processing architecture (REST Service threads + Agent Runner) - Add Queue Mode input variables section covering SQS visibility timeouts and Agent Runner configuration - Restructure openai-dynamodb-scalable example README with improved architecture overview and deployed resources documentation - Remove clean.sh and rebuild.sh scripts from scalable example (moved to root) - Add .terraform.lock.hcl file to deploy directory for reproducible Terraform deployments - Provides clear guidance for implementing high-throughput, asynchronously-processed agent workloads
- Convert single quotes to double quotes for string literals - Consolidate multi-line string concatenations to single lines - Remove trailing whitespace and normalize blank lines - Simplify multi-line function calls and error messages - Apply consistent formatting across ECSAgentRunner, ECSRESTService, ECS queue handler, and SQS poller modules - Ensure consistent code style across containerized deployment infrastructure
- Add session_id parameter validation in ECS queue handler GET endpoint - Implement security check to verify response session_id matches URL path session_id - Add session_id validation in serverless Lambda REST_ASYNC polling operation - Enhance logging to include session_id in poll operation for better traceability - Return 403 Forbidden with detailed error when session_id mismatch is detected - Prevent unauthorized access to responses belonging to different sessions - Improve error messages to distinguish between missing and mismatched session IDs
…queue polling - Add validation to require either request_id or session_id in GET polling requests - Return 404 with detailed error message when no response is found instead of PENDING status - Update error response format for session ID mismatch to use FORBIDDEN status code - Include request_id and session_id in error response details for better debugging - Enhance error messages with context about message unavailability and retry guidance
…ponses - Change HTTP status from 403 (FORBIDDEN) to 404 (NOT_FOUND) in ECS queue handler when response message is not found - Update error message to indicate message unavailability rather than session mismatch - Add request_id to error detail in ECS queue handler response - Align serverless Lambda router error response with queue handler for consistency - Improve error messaging clarity for clients when async response messages cannot be located
- Add scaling.tf with Lambda-based BacklogPerTask metric calculation - Implement EventBridge trigger for metric computation every minute - Add target tracking scaling policy for Agent Runner ECS service - Add autoscaling configuration variables to variables.tf - Document autoscaling parameters and usage in README - Enable configurable min/max task counts and scale in/out cooldown periods - Add validation to require queue_mode when autoscaling is enabled - Update example deployment to demonstrate autoscaling configuration - Supports both sync and async queue modes with automatic task scaling based on queue depth
… and configuration files
…es for conditional creation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Type of Change
Related Issues
Fixes #
Relates to #
Changes Made
Testing
Checklist
Screenshots (if applicable)
Additional Notes