Skip to content

Conversation

@jantmer
Copy link

@jantmer jantmer commented Nov 3, 2025

🚀 Extended Bedrock Batch Orchestrator with Multi-Stage Pipeline Support

Summary

This PR extends the Bedrock Batch Orchestrator to support multi-stage pipeline configurations, enabling batch inference workflows where multiple stages can be chained together. It also adds multimodal support for image processing and includes Amazon Nova model compatibility.

🎯 Key Features

Multi-Stage Pipeline Architecture

  • Pipeline Chaining: Chain multiple batch inference stages together, where each stage uses outputs from previous stages
  • Transform Stage Logic: New transform_stage.py Lambda function handles data flow between pipeline stages
  • Pipeline Validation: Added validate_pipeline_config.py to ensure pipeline configurations are valid before execution
  • Event-Driven Orchestration: Enhanced Step Functions workflow to manage multi-stage pipelines with automatic stage transitions

Enhanced Preprocessing & Prompt Matching

  • Multimodal Support: Extended preprocessor to handle both text and image inputs
  • Category-Based Routing: Intelligent prompt matching logic that routes records to different prompts based on categories
  • Expansion Rules: Support for generating multiple prompts per input record
  • Flexible Prompt Modes: Single prompt, mapped prompts, and expansion-based prompt generation

Improved Postprocessing

  • Structured Output Extraction: Automatically extract specific fields from JSON responses using output schemas
  • Data Preservation: Maintain original data across pipeline stages for final aggregation
  • Enhanced Result Parsing: Better handling of complex JSON responses and error cases

Notification System

  • Email Notifications: Optional email alerts when pipelines complete
  • Presigned URLs: Automatic generation of S3 presigned URLs for easy result downloads
  • Configurable Expiry: Control how long download links remain valid

Amazon Nova Model Support

  • Added prompt templates and configurations for Amazon Nova models
  • Updated model compatibility across the orchestrator

📊 Changes by Component

Infrastructure (CDK)

  • Enhanced Step Functions state machine for multi-stage pipeline orchestration
  • Added new Lambda functions: transform_stage, send_notification, validate_pipeline_config
  • Updated IAM permissions for cross-stage data access
  • Added SNS topic for email notifications (optional)

Lambda Functions

  • preprocess.py: +429 lines - Multimodal support, category-based routing
  • postprocess.py: +282 lines - Structured output extraction, enhanced parsing
  • processor.py: +204 lines - Multi-stage job management
  • prompt_templates.py: +496 lines - New templates including Nova models
  • custom_types.py: +76 lines - Enhanced type definitions for pipelines

Configuration & Examples

  • Added sample pipeline configs: clothing-analysis-full.json, clothing-analysis-test.json
  • Included sample JSONL files demonstrating multi-stage workflows
  • Added 100+ sample clothing images for testing multimodal capabilities

Documentation

  • Comprehensive README update (+428 lines) with:
    • Multi-stage pipeline architecture diagrams
    • Detailed usage examples for both single and multi-stage processing
    • Configuration guides for various use cases
    • Troubleshooting section

📈 Statistics

  • 123 files changed
  • 3,571 insertions, 145 deletions
  • New Lambda functions: 3
  • New pipeline configs: 2
  • Sample images added: 100+

🔧 Configuration

Multi-stage pipelines are configured via JSON files in pipeline-configs/:

{
  "stages": [
    {
      "stage_name": "stage_1",
      "prompt_mode": "single",
      "model_id": "amazon.nova-lite-v1:0",
      "output_schema": {...}
    }
  ]
}

🎨 Use Cases Enabled

  • Multi-step Analysis: Break complex analysis into sequential stages
  • Category-Specific Processing: Route different data types to specialized prompts
  • Multimodal Workflows: Process images with text descriptions
  • Structured Data Extraction: Extract specific fields from unstructured responses
  • Large-Scale Batch Jobs: Process thousands of records with 50% cost savings

⚙️ Breaking Changes

None - fully backward compatible with existing single-stage configurations.

🧪 Testing

Includes sample configurations and test data for:

  • Single-stage text generation
  • Multi-stage clothing analysis pipeline
  • Multimodal image + text processing

📝 Notes

  • Email notifications are optional and require configuration in cdk.json
  • Presigned URL expiry is configurable per pipeline
  • All existing single-stage functionality remains unchanged

…ine configuration. Updated orchestrator preprocessor to support images, and extended the prompt matching logic. Added support for Amazon Nova Models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant