A GitHub Template for learning Platform Engineering & SRE practices with AWS EKS
This is a production-ready GitHub template for a structured 6-8 week learning journey through modern platform engineering and Site Reliability Engineering (SRE) practices.
Perfect for:
- πΌ Building a portfolio project to showcase platform engineering skills
- π Learning AWS EKS, Kubernetes, and Infrastructure as Code
- π Transitioning into Platform Engineering or SRE roles
- π Understanding production-grade reliability patterns
- π‘ Hands-on experience with real cloud infrastructure
By the end of this 8-week journey, you'll have:
β Production-grade EKS cluster running on AWS β Containerized microservice with FastAPI β Full CI/CD pipeline with GitHub Actions β Infrastructure as Code using Terraform β Comprehensive observability with CloudWatch β Chaos engineering experiments documented β Portfolio-ready case study to showcase β Real production experience to discuss in interviews
Click the green "Use this template" button at the top of this page, then:
- Select "Create a new repository"
- Name your repo (e.g.,
my-reliability-lab) - Choose Public visibility (for portfolio)
- Click "Create repository from template"
# Clone your new repository
git clone git@github.com:YOUR-USERNAME/my-reliability-lab.git
cd my-reliability-lab
# Navigate to the lab directory
cd reliability-lab
# Run customization script
chmod +x .github/template-cleanup.sh
./.github/template-cleanup.sh YOUR-USERNAME my-reliability-lab
# Commit customizations
git add .
git commit -m "chore: customize template"
git push# Install dependencies
python3 -m venv venv
source venv/bin/activate
make dev
# Run tests
make test
# Start the app
cd deploy/compose
docker-compose upThen open docs/lab-roadmap.md and begin Week 1!
reliability-lab/
βββ services/sample-app/ # FastAPI microservice
β βββ app/main.py # Application code
β βββ tests/ # Test suite
β βββ Dockerfile # Multi-stage build
βββ deploy/
β βββ compose/ # Local development
β βββ k8s/ # Kubernetes manifests
β βββ terraform/ # Infrastructure as Code
βββ ops/
β βββ observability/ # Monitoring configs
β βββ security/ # Security baselines
βββ docs/
β βββ architecture/adr/ # Architecture decisions
β βββ lab-roadmap/ # Week-by-week guides
β βββ runbooks/ # Operational procedures
β βββ reliability/ # SRE concepts
βββ .github/workflows/ # CI/CD pipelines
| Week | Focus | Time | Outcome |
|---|---|---|---|
| 1 | EKS & Containerization | 10-15h | App running on EKS |
| 2 | Terraform & AWS Baseline | 10-15h | IaC foundation |
| 3 | Kubernetes Hardening | 10-15h | Production patterns |
| 4 | CI/CD Pipeline | 10-15h | Automated deployments |
| 5 | Observability | 10-15h | Metrics, logs, SLOs |
| 6 | Chaos Engineering | 10-15h | Resilience testing |
| 7-8 | Polish & Portfolio | 15-20h | Case study complete |
Total: 75-110 hours over 6-8 weeks
β Security First
- Non-root containers
- Trivy security scanning
- IAM least-privilege patterns
- No secrets in code (IRSA + Secrets Manager)
β Cost Optimized
- Designed for AWS Free Tier where possible
- Cost-conscious architecture decisions
- ~$90-180 total budget (with optimization)
- Guidance on scaling to $0 between sessions
β Best Practices
- Multi-stage Docker builds
- Kubernetes health probes
- Infrastructure as Code
- Comprehensive testing
- Pre-commit hooks
β Documentation
- Architecture Decision Records (ADRs)
- Operational runbooks
- SRE concepts and glossary
- Portfolio-ready case study template
For an enhanced learning experience, use this template with the companion Reliability Lab Setup Guide repository.
The setup guide provides:
- Claude AI configuration as your personal SRE mentor
- Step-by-step onboarding instructions
- Weekly coaching content
- Learning reinforcement exercises
- Community support
With AI assistance, you get:
- Real-time explanations of complex concepts
- Troubleshooting help when stuck
- Best practices recommendations
- Cost optimization guidance
- Portfolio-building advice
- π― Are transitioning into Platform Engineering or SRE roles
- π Learn best by building real projects
- πΌ Need portfolio projects to showcase
- π Want hands-on AWS and Kubernetes experience
- π° Have a budget of $100-200 for learning
- β° Can dedicate 10-15 hours per week for 6-8 weeks
- β Basic programming (Python helpful but not required)
- β Git and GitHub fundamentals
- β Command line basics
- β Basic cloud concepts (AWS account setup)
β οΈ No Kubernetes expertise needed - you'll learn it!β οΈ No Terraform expertise needed - you'll learn it!
Expected Costs:
- Week 1: $10-20 (EKS cluster exploration)
- Weeks 2-6: $15-20/week (active development)
- Weeks 7-8: $5-10 (polish & documentation)
- Total: $90-180 (6-8 weeks)
Optimization Tips:
- Scale EKS nodes to 0 when not working
- Delete LoadBalancers between sessions
- Use CloudWatch (free tier) instead of paid tools
- Destroy infrastructure between weeks if taking breaks
Ultra-budget option: ~$30/month by running only on weekends
- AWS account (personal tier is fine)
- GitHub account
- Claude.ai account (optional, for AI assistance)
- Docker Desktop
- AWS CLI v2
- kubectl
- Terraform
- Python 3.11+
- Git
- Make
Verify prerequisites:
docker --version
aws --version
kubectl version --client
terraform --version
python3 --version- TEMPLATE.md - Detailed template usage guide
- README.md - Project overview
- CLAUDE.md - AI assistant guidance
- docs/lab-roadmap.md - 8-week learning plan
After completing this lab, you'll be able to:
- β Deploy and manage AWS EKS clusters
- β Write Infrastructure as Code with Terraform
- β Build production-ready containerized applications
- β Implement CI/CD pipelines with GitHub Actions
- β Configure observability and monitoring
- β Apply Kubernetes production patterns (HPA, PDB, IRSA)
- β Conduct chaos engineering experiments
- β Implement security best practices
- β Public GitHub repository showcasing skills
- β Architecture diagrams and documentation
- β Technical case study writeup
- β Evidence of production-grade work
- β Demonstrable problem-solving ability
- β Real infrastructure experience to discuss
- β Understanding of SRE principles
- β Cost and security awareness
- β Production incident handling experience
- β Clear articulation of technical decisions
- β Production-grade, not toy examples
- β Cost-conscious throughout
- β Security-first approach
- β Portfolio-ready outcomes
- β Complete infrastructure, not just snippets
- β Structured learning path with clear milestones
- β Explanations included - understand the "why"
- β Guided troubleshooting - learn to debug
- β Time-boxed - complete in 6-8 weeks
- β Budget-friendly - under $200 total
Found an issue? Have an improvement?
- Fork this repository
- Create a feature branch
- Make your changes
- Submit a Pull Request
All contributions welcome - from typo fixes to new features!
Completed the lab? We'd love to hear about it!
- β Star this repository
- π Share your case study (link in Discussions)
- π¦ Tweet with #ReliabilityLab
- πΌ Add to your LinkedIn portfolio
- πΈ Share screenshots in Discussions
- Template Issues: Open a GitHub Issue
- Learning Questions: Use GitHub Discussions
- Community: Join relevant Slack channels (CNCF, AWS)
- AI Assistance: Use the setup-guide companion repo
MIT License - see LICENSE
Free to use for learning, portfolio, or commercial purposes.
Inspired by:
- Google SRE Book
- AWS Well-Architected Framework
- Kubernetes Best Practices
- Production engineering experiences
- Community feedback and contributions
- Click "Use this template" above
- Clone your new repository
- Run the customization script
- Start Week 1!
Your platform engineering journey begins now! π
Questions? Check Discussions or open an Issue
Template Version: 1.0.0 Last Updated: 2025-11-01 Maintained by: YOUR-USERNAME