🚀 Extended Bedrock Batch Orchestrator with Multi-Stage Pipeline Support and MultiModality #654

jantmer · 2025-11-03T10:11:39Z

🚀 Extended Bedrock Batch Orchestrator with Multi-Stage Pipeline Support

Summary

This PR extends the Bedrock Batch Orchestrator to support multi-stage pipeline configurations, enabling batch inference workflows where multiple stages can be chained together. It also adds multimodal support for image processing and includes Amazon Nova model compatibility.

🎯 Key Features

Multi-Stage Pipeline Architecture

Pipeline Chaining: Chain multiple batch inference stages together, where each stage uses outputs from previous stages
Transform Stage Logic: New transform_stage.py Lambda function handles data flow between pipeline stages
Pipeline Validation: Added validate_pipeline_config.py to ensure pipeline configurations are valid before execution
Event-Driven Orchestration: Enhanced Step Functions workflow to manage multi-stage pipelines with automatic stage transitions

Enhanced Preprocessing & Prompt Matching

Multimodal Support: Extended preprocessor to handle both text and image inputs
Category-Based Routing: Intelligent prompt matching logic that routes records to different prompts based on categories
Expansion Rules: Support for generating multiple prompts per input record
Flexible Prompt Modes: Single prompt, mapped prompts, and expansion-based prompt generation

Improved Postprocessing

Structured Output Extraction: Automatically extract specific fields from JSON responses using output schemas
Data Preservation: Maintain original data across pipeline stages for final aggregation
Enhanced Result Parsing: Better handling of complex JSON responses and error cases

Notification System

Email Notifications: Optional email alerts when pipelines complete
Presigned URLs: Automatic generation of S3 presigned URLs for easy result downloads
Configurable Expiry: Control how long download links remain valid

Amazon Nova Model Support

Added prompt templates and configurations for Amazon Nova models
Updated model compatibility across the orchestrator

📊 Changes by Component

Infrastructure (CDK)

Enhanced Step Functions state machine for multi-stage pipeline orchestration
Added new Lambda functions: transform_stage, send_notification, validate_pipeline_config
Updated IAM permissions for cross-stage data access
Added SNS topic for email notifications (optional)

Lambda Functions

preprocess.py: +429 lines - Multimodal support, category-based routing
postprocess.py: +282 lines - Structured output extraction, enhanced parsing
processor.py: +204 lines - Multi-stage job management
prompt_templates.py: +496 lines - New templates including Nova models
custom_types.py: +76 lines - Enhanced type definitions for pipelines

Configuration & Examples

Added sample pipeline configs: clothing-analysis-full.json, clothing-analysis-test.json
Included sample JSONL files demonstrating multi-stage workflows
Added 100+ sample clothing images for testing multimodal capabilities

Documentation

Comprehensive README update (+428 lines) with:
- Multi-stage pipeline architecture diagrams
- Detailed usage examples for both single and multi-stage processing
- Configuration guides for various use cases
- Troubleshooting section

📈 Statistics

123 files changed
3,571 insertions, 145 deletions
New Lambda functions: 3
New pipeline configs: 2
Sample images added: 100+

🔧 Configuration

Multi-stage pipelines are configured via JSON files in pipeline-configs/:

{
  "stages": [
    {
      "stage_name": "stage_1",
      "prompt_mode": "single",
      "model_id": "amazon.nova-lite-v1:0",
      "output_schema": {...}
    }
  ]
}

🎨 Use Cases Enabled

Multi-step Analysis: Break complex analysis into sequential stages
Category-Specific Processing: Route different data types to specialized prompts
Multimodal Workflows: Process images with text descriptions
Structured Data Extraction: Extract specific fields from unstructured responses
Large-Scale Batch Jobs: Process thousands of records with 50% cost savings

⚙️ Breaking Changes

None - fully backward compatible with existing single-stage configurations.

🧪 Testing

Includes sample configurations and test data for:

Single-stage text generation
Multi-stage clothing analysis pipeline
Multimodal image + text processing

📝 Notes

Email notifications are optional and require configuration in cdk.json
Presigned URL expiry is configurable per pipeline
All existing single-stage functionality remains unchanged

…ine configuration. Updated orchestrator preprocessor to support images, and extended the prompt matching logic. Added support for Amazon Nova Models.

Extended bedrock batch orchestrator to be used in a multi-stage pipel…

a595113

…ine configuration. Updated orchestrator preprocessor to support images, and extended the prompt matching logic. Added support for Amazon Nova Models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚀 Extended Bedrock Batch Orchestrator with Multi-Stage Pipeline Support and MultiModality #654

🚀 Extended Bedrock Batch Orchestrator with Multi-Stage Pipeline Support and MultiModality #654

Uh oh!

jantmer commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🚀 Extended Bedrock Batch Orchestrator with Multi-Stage Pipeline Support and MultiModality #654

Are you sure you want to change the base?

🚀 Extended Bedrock Batch Orchestrator with Multi-Stage Pipeline Support and MultiModality #654

Uh oh!

Conversation

jantmer commented Nov 3, 2025