Comprehensive Terraform modules for cost control in the NDX Innovation Sandbox AWS deployment. Implements 3-layer defense-in-depth architecture to protect against cost attacks in 24-hour sandbox leases.
- Defense Architecture
- Quick Start
- Modules
- Cost Protection Analysis
- Attack Vector Coverage
- Configuration
- Deployment
Each sandbox lease is 24 hours. The 3-layer defense system prevents runaway costs:
┌─────────────────────────────────────────────────────────────────────────────────┐
│ COST DEFENSE IN DEPTH (3 LAYERS) │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ LAYER 1: SERVICE CONTROL POLICIES (SCPs) [PREVENTION] │
│ ═══════════════════════════════════════════════════════════════════════ │
│ Module: scp-manager │
│ Controls WHAT actions are allowed │
│ • EC2 instance type allowlist (t2, t3, t3a, m5, m6i - small to xlarge) │
│ • GPU/accelerated instances BLOCKED (p2-p5, g3-g6, inf1-2, trn1-2) │
│ • EBS: io1/io2 BLOCKED, max volume 500GB │
│ • RDS: Instance class limits, Multi-AZ BLOCKED, IOPS BLOCKED │
│ • 20+ expensive services BLOCKED (SageMaker, EMR, Redshift, MSK, etc.) │
│ • ASG max size: 10, EKS nodegroup max: 5 │
│ │
│ LAYER 2: AWS BUDGETS [DETECTION] │
│ ═══════════════════════════════════════════════════════════════════════ │
│ Module: budgets-manager │
│ Controls HOW MUCH MONEY can be spent (with aggressive alerting) │
│ • Daily budget: $50/day (alerts at 10%, 50%, 100%) │
│ • Monthly budget: $1000/month │
│ • 10 service-specific budgets with <1 hour detection: │
│ - CloudWatch: $5/day (critical - no service quota for log ingestion) │
│ - Lambda: $10/day │
│ - DynamoDB: $5/day │
│ - Bedrock: $10/day │
│ - EC2, RDS, S3, API Gateway, Step Functions, Data Transfer │
│ • SNS notifications + optional automated actions │
│ │
│ LAYER 3: DYNAMODB BILLING ENFORCER [AUTO-REMEDIATE] │
│ ═══════════════════════════════════════════════════════════════════════ │
│ Module: dynamodb-billing-enforcer │
│ EventBridge + Lambda to close critical DynamoDB On-Demand gap │
│ • Detects CreateTable/UpdateTable with On-Demand billing │
│ • Auto-converts to Provisioned mode with enforced capacity limits │
│ • Max 100 RCU, 100 WCU per table (~$1.87/day vs UNLIMITED) │
│ • SNS alerts on enforcement actions │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
cd environments/ndx-production
terraform init
terraform plan
terraform applyPurpose: Creates and manages Service Control Policies (SCPs) that PREVENT expensive actions at the AWS API level.
Location: modules/scp-manager/
Key Features:
- EC2 Controls: Allowlist of permitted instance types, explicit deny for GPU/accelerated
- EBS Controls: Block io1/io2 volumes, limit volume size to 500GB
- RDS Controls: Instance class restrictions, Multi-AZ blocked, read replicas blocked
- ElastiCache Controls: Node type restrictions
- Lambda Controls: Provisioned concurrency blocked
- EKS/ASG Controls: Nodegroup and Auto Scaling Group size limits
- Expensive Services Blocked: 20+ services including SageMaker, EMR, Redshift, MSK, FSx, etc.
SCPs Created:
| SCP Name | Purpose | Conditional |
|---|---|---|
InnovationSandboxAwsNukeSupportedServicesScp |
Allowlist of services (uses NotAction deny) | No |
InnovationSandboxRestrictionsScp |
Region restrictions and security isolation | No |
InnovationSandboxCostAvoidanceScp |
Comprehensive cost controls | No |
InnovationSandboxIamWorkloadIdentityScp |
IAM role/user creation with privilege escalation guardrails | Yes (enable_iam_workload_identity, disabled by default) |
IAM Workload Identity SCP:
When enabled, sandbox users can create IAM roles and users for their workloads (e.g. EC2 instance profiles, Lambda execution roles) while being prevented from escalating their own privileges.
Users CAN:
- Create IAM roles and users for workloads
- Attach policies to their created roles/users
- Create instance profiles for EC2
Users CANNOT:
- Create roles/users matching protected name patterns (
Admin*,InnovationSandbox*,AWSAccelerator*,OrganizationAccountAccessRole, etc.) - Modify, delete, or attach policies to privileged admin roles
- Pass or assume privileged roles (Control Tower, LZA, service-linked, admin)
variable "enable_iam_workload_identity" {
default = false # Enable when sandbox users need to create IAM roles
}Note: The Innovation Sandbox
SecurityAndIsolationRestrictionsSCP must also be modified to removeiam:CreateRoleandiam:CreateUserfrom its deny list for this SCP to take effect.
Key Variables:
variable "allowed_ec2_instance_types" {
default = ["t2.*", "t3.micro", "t3.small", "t3.medium", "t3.large", ...]
}
variable "max_ebs_volume_size_gb" { default = 500 }
variable "max_autoscaling_group_size" { default = 10 }
variable "max_eks_nodegroup_size" { default = 5 }Purpose: Creates AWS Budgets with AGGRESSIVE thresholds for early detection of cost abuse. Alerts within <1 hour of most attack patterns.
Location: modules/budgets-manager/
Key Features:
- Daily and monthly overall budgets
- 10 service-specific budgets for attack vector coverage
- Multi-threshold alerts (50%, 80%, 100%)
- Optional automated actions (stop EC2 at threshold)
- SNS notifications + direct email subscriptions
Budget Thresholds (Aggressive):
| Service | Daily Limit | Alert At (50%) | Time to Detect Max Abuse |
|---|---|---|---|
| CloudWatch | $5 | $2.50 | ~40 seconds |
| Lambda | $10 | $5 | ~20 minutes |
| DynamoDB | $5 | $2.50 | ~46 minutes |
| Bedrock | $10 | $5 | ~33 minutes |
| EC2 | $10 | $5 | ~1.5 hours |
| RDS | $5 | $2.50 | ~3 hours |
| S3 | $10 | $5 | ~1 hour |
| Step Functions | $5 | $2.50 | ~1 hour |
| API Gateway | $5 | $2.50 | ~2 hours |
| Data Transfer | $10 | $5 | ~1 hour |
Why So Aggressive?
CloudWatch Logs ingestion has NO service quota protection. At $225/hour potential abuse, a $5/day budget with 50% threshold triggers an alert in ~40 seconds of malicious activity.
Purpose: Closes the CRITICAL gap where DynamoDB On-Demand mode has no capacity limits.
Location: modules/dynamodb-billing-enforcer/
The Problem:
- DynamoDB On-Demand mode is purely pay-per-request with no capacity limits
- Attacker could create On-Demand tables and generate unlimited costs
- There is NO SCP condition key for
dynamodb:BillingMode
The Solution:
- EventBridge rule detects
CreateTableandUpdateTableAPI calls via CloudTrail - Lambda function checks billing mode of the table
- If On-Demand detected, automatically converts to Provisioned with enforced limits
- Sends SNS alert about the enforcement action
Enforcement Modes:
| Mode | Action |
|---|---|
convert |
Auto-convert to Provisioned (RECOMMENDED) |
delete |
Delete the table (aggressive) |
alert |
Alert only, no remediation |
Enforced Capacity:
max_rcu = 100 # ~$0.31/day per table
max_wcu = 100 # ~$1.56/day per table
# Total: ~$1.87/day per table (vs UNLIMITED in On-Demand)| Service | Protection Layer | Max Daily Cost |
|---|---|---|
| EC2 Compute | SCP (instance type limits) | ~$77 |
| EBS Storage | SCP (io1/io2 blocked, 500GB max) | ~$6 |
| RDS | SCP (instance class + Multi-AZ blocked) | ~$22 |
| ElastiCache | SCP (node type limits) | ~$40 |
| Lambda | Budget ($10/day) | ~$10 |
| DynamoDB | Enforcer + Budget | ~$5 |
| Bedrock | Budget ($10/day) | ~$10 |
| CloudWatch | Budget ($5/day, alerts fast) | ~$5+ |
| Total Bounded | ~$175/day |
| Attack Vector | Before | After | Reduction |
|---|---|---|---|
| CloudWatch Log Flood | $21,600/day | Budget alerts in ~40 sec | 99%+ awareness |
| Lambda Memory Abuse | $1,440/day | Budget alert at $10/day | 99%+ awareness |
| DynamoDB On-Demand | UNLIMITED | $1.87/table (auto-convert) | 99%+ |
| GPU Instances | $786+/day | BLOCKED (SCP) | 100% |
| Expensive Services | $1000s/day | BLOCKED (SCP) | 100% |
| Attack Vector | Layer 1 (SCP) | Layer 2 (Budget) | Layer 3 (Enforcer) |
|---|---|---|---|
| GPU Instances | ✅ BLOCKED | ✅ $10/day | - |
| Large EC2 | ✅ Type limit | ✅ $10/day | - |
| EBS io1/io2 | ✅ BLOCKED | ✅ via EC2 | - |
| RDS Multi-AZ | ✅ BLOCKED | ✅ $5/day | - |
| Lambda Memory | ❌ No SCP key | ✅ $10/day | - |
| DynamoDB On-Demand | ❌ No SCP key | ✅ $5/day | ✅ AUTO-CONVERT |
| CloudWatch Logs | ❌ No SCP key | ✅ $5/day | - |
| Bedrock Tokens | - | ✅ $10/day | - |
| API Gateway | - | ✅ $5/day | - |
| SageMaker | ✅ BLOCKED | - | - |
| EMR | ✅ BLOCKED | - | - |
| Redshift | ✅ BLOCKED | - | - |
Legend:
- ✅ Protected
- ❌ No protection at this layer (covered by other layers)
-Not applicable
The environments/ndx-production/ configuration uses these key variables:
# Enable/disable modules
variable "enable_budgets" { default = true }
variable "enable_dynamodb_billing_enforcer" { default = true }
# Budget limits (aggressive defaults)
variable "daily_budget_limit" { default = 50 }
variable "monthly_budget_limit" { default = 1000 }
# Alert recipients - set via GitHub Actions secret BUDGET_ALERT_EMAILS
# Do NOT hardcode emails in terraform files
variable "budget_alert_emails" {
default = [] # Provided via TF_VAR_budget_alert_emails from GitHub secret
}Each module has extensive configuration options. Key customization points:
-
Adjust EC2 instance allowlist (
scp-manager):allowed_ec2_instance_types = ["t3.micro", "t3.small", "t3.medium"]
-
Adjust budget thresholds (
budgets-manager):cloudwatch_daily_limit = 10 # Less aggressive
-
DynamoDB enforcement mode (
dynamodb-billing-enforcer):enforcement_mode = "alert" # Just alert, don't auto-convert
- AWS CLI configured with Organizations admin access
- Terraform >= 1.5
- Access to the management account
# 1. Initialize
cd environments/ndx-production
terraform init
# 2. Review changes
terraform plan
# 3. Apply
terraform applyIf SCPs were created by ISB CDK, import them first:
# Find SCP IDs
aws organizations list-policies --filter SERVICE_CONTROL_POLICY \
--query 'Policies[?starts_with(Name, `InnovationSandbox`)].{Name:Name,Id:Id}'
# Import
terraform import 'module.scp_manager.aws_organizations_policy.nuke_supported_services' p-xxxxxxxxx
terraform import 'module.scp_manager.aws_organizations_policy.limit_regions' p-yyyyyyyyyLZA may revert SCP changes. Disable in security-config.yaml:
scpRevertChangesConfig:
enable: falseAfter deployment, key outputs include:
cost_defense_summary = {
layer_1_scps = { status = "Always enabled", controls = [...] }
layer_2_budgets = { service_budgets = "10 services monitored" }
layer_3_dynamodb_enforcer = { mode = "Auto-convert On-Demand to Provisioned" }
gap_analysis = {
critical_gaps_closed = [
"CloudWatch Logs: Budget alert at $5/day",
"DynamoDB On-Demand: Auto-convert enforcer",
]
defense_effectiveness = "~95% of attack vectors blocked or bounded"
}
}terraform-scp-overrides/
├── README.md # This file
├── modules/
│ ├── scp-manager/ # Layer 1: Service Control Policies
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── budgets-manager/ # Layer 2: AWS Budgets
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── dynamodb-billing-enforcer/ # Layer 3: Auto-remediation
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
└── environments/
└── ndx-production/ # Production environment
├── main.tf
├── variables.tf
└── terraform.tfvars
- No credentials stored - Uses OIDC for GitHub Actions
- SCPs use exempt role patterns - Admin roles are not blocked
- Budgets alert, don't block - Ensure monitoring is active
- DynamoDB enforcer requires CloudTrail - Ensure CloudTrail is enabled
Internal use only - NDX Innovation Sandbox / UK Government Digital Service