🎓 Reliability Lab Template

A GitHub Template for learning Platform Engineering & SRE practices with AWS EKS

🎯 What is This?

This is a production-ready GitHub template for a structured 6-8 week learning journey through modern platform engineering and Site Reliability Engineering (SRE) practices.

Perfect for:

💼 Building a portfolio project to showcase platform engineering skills
📚 Learning AWS EKS, Kubernetes, and Infrastructure as Code
🎓 Transitioning into Platform Engineering or SRE roles
🚀 Understanding production-grade reliability patterns
💡 Hands-on experience with real cloud infrastructure

⭐ What You'll Build

By the end of this 8-week journey, you'll have:

✅ Production-grade EKS cluster running on AWS ✅ Containerized microservice with FastAPI ✅ Full CI/CD pipeline with GitHub Actions ✅ Infrastructure as Code using Terraform ✅ Comprehensive observability with CloudWatch ✅ Chaos engineering experiments documented ✅ Portfolio-ready case study to showcase ✅ Real production experience to discuss in interviews

🚀 Quick Start

1. Use This Template

Click the green "Use this template" button at the top of this page, then:

Select "Create a new repository"
Name your repo (e.g., my-reliability-lab)
Choose Public visibility (for portfolio)
Click "Create repository from template"

2. Clone and Customize

# Clone your new repository
git clone git@github.com:YOUR-USERNAME/my-reliability-lab.git
cd my-reliability-lab

# Navigate to the lab directory
cd reliability-lab

# Run customization script
chmod +x .github/template-cleanup.sh
./.github/template-cleanup.sh YOUR-USERNAME my-reliability-lab

# Commit customizations
git add .
git commit -m "chore: customize template"
git push

3. Start Learning!

# Install dependencies
python3 -m venv venv
source venv/bin/activate
make dev

# Run tests
make test

# Start the app
cd deploy/compose
docker-compose up

Then open docs/lab-roadmap.md and begin Week 1!

📚 What's Included

Complete Application Stack

reliability-lab/
├── services/sample-app/          # FastAPI microservice
│   ├── app/main.py              # Application code
│   ├── tests/                   # Test suite
│   └── Dockerfile               # Multi-stage build
├── deploy/
│   ├── compose/                 # Local development
│   ├── k8s/                     # Kubernetes manifests
│   └── terraform/               # Infrastructure as Code
├── ops/
│   ├── observability/           # Monitoring configs
│   └── security/                # Security baselines
├── docs/
│   ├── architecture/adr/        # Architecture decisions
│   ├── lab-roadmap/             # Week-by-week guides
│   ├── runbooks/                # Operational procedures
│   └── reliability/             # SRE concepts
└── .github/workflows/           # CI/CD pipelines

8-Week Structured Learning Path

Week	Focus	Time	Outcome
1	EKS & Containerization	10-15h	App running on EKS
2	Terraform & AWS Baseline	10-15h	IaC foundation
3	Kubernetes Hardening	10-15h	Production patterns
4	CI/CD Pipeline	10-15h	Automated deployments
5	Observability	10-15h	Metrics, logs, SLOs
6	Chaos Engineering	10-15h	Resilience testing
7-8	Polish & Portfolio	15-20h	Case study complete

Total: 75-110 hours over 6-8 weeks

Production-Ready Features

✅ Security First

Non-root containers
Trivy security scanning
IAM least-privilege patterns
No secrets in code (IRSA + Secrets Manager)

✅ Cost Optimized

Designed for AWS Free Tier where possible
Cost-conscious architecture decisions
~$90-180 total budget (with optimization)
Guidance on scaling to $0 between sessions

✅ Best Practices

Multi-stage Docker builds
Kubernetes health probes
Infrastructure as Code
Comprehensive testing
Pre-commit hooks

✅ Documentation

Architecture Decision Records (ADRs)
Operational runbooks
SRE concepts and glossary
Portfolio-ready case study template

🤖 Optional: AI-Assisted Learning

For an enhanced learning experience, use this template with the companion Reliability Lab Setup Guide repository.

The setup guide provides:

Claude AI configuration as your personal SRE mentor
Step-by-step onboarding instructions
Weekly coaching content
Learning reinforcement exercises
Community support

With AI assistance, you get:

Real-time explanations of complex concepts
Troubleshooting help when stuck
Best practices recommendations
Cost optimization guidance
Portfolio-building advice

💡 Who Is This For?

Perfect If You:

🎯 Are transitioning into Platform Engineering or SRE roles
📚 Learn best by building real projects
💼 Need portfolio projects to showcase
🚀 Want hands-on AWS and Kubernetes experience
💰 Have a budget of $100-200 for learning
⏰ Can dedicate 10-15 hours per week for 6-8 weeks

You Should Know:

✅ Basic programming (Python helpful but not required)
✅ Git and GitHub fundamentals
✅ Command line basics
✅ Basic cloud concepts (AWS account setup)
⚠️ No Kubernetes expertise needed - you'll learn it!
⚠️ No Terraform expertise needed - you'll learn it!

💰 Cost Breakdown

Expected Costs:

Week 1: $10-20 (EKS cluster exploration)
Weeks 2-6: $15-20/week (active development)
Weeks 7-8: $5-10 (polish & documentation)
Total: $90-180 (6-8 weeks)

Optimization Tips:

Scale EKS nodes to 0 when not working
Delete LoadBalancers between sessions
Use CloudWatch (free tier) instead of paid tools
Destroy infrastructure between weeks if taking breaks

Ultra-budget option: ~$30/month by running only on weekends

🛠️ Prerequisites

Required Accounts

AWS account (personal tier is fine)
GitHub account
Claude.ai account (optional, for AI assistance)

Required Tools

Docker Desktop
AWS CLI v2
kubectl
Terraform
Python 3.11+
Git
Make

Verify prerequisites:

docker --version
aws --version
kubectl version --client
terraform --version
python3 --version

📖 Documentation

TEMPLATE.md - Detailed template usage guide
README.md - Project overview
CLAUDE.md - AI assistant guidance
docs/lab-roadmap.md - 8-week learning plan

🎯 Learning Outcomes

After completing this lab, you'll be able to:

Technical Skills

✅ Deploy and manage AWS EKS clusters
✅ Write Infrastructure as Code with Terraform
✅ Build production-ready containerized applications
✅ Implement CI/CD pipelines with GitHub Actions
✅ Configure observability and monitoring
✅ Apply Kubernetes production patterns (HPA, PDB, IRSA)
✅ Conduct chaos engineering experiments
✅ Implement security best practices

Portfolio Assets

✅ Public GitHub repository showcasing skills
✅ Architecture diagrams and documentation
✅ Technical case study writeup
✅ Evidence of production-grade work
✅ Demonstrable problem-solving ability

Interview Readiness

✅ Real infrastructure experience to discuss
✅ Understanding of SRE principles
✅ Cost and security awareness
✅ Production incident handling experience
✅ Clear articulation of technical decisions

🌟 Features That Set This Apart

From Other Tutorials:

✅ Production-grade, not toy examples
✅ Cost-conscious throughout
✅ Security-first approach
✅ Portfolio-ready outcomes
✅ Complete infrastructure, not just snippets

From Real Work:

✅ Structured learning path with clear milestones
✅ Explanations included - understand the "why"
✅ Guided troubleshooting - learn to debug
✅ Time-boxed - complete in 6-8 weeks
✅ Budget-friendly - under $200 total

🤝 Contributing

Found an issue? Have an improvement?

Fork this repository
Create a feature branch
Make your changes
Submit a Pull Request

All contributions welcome - from typo fixes to new features!

📣 Share Your Success

Completed the lab? We'd love to hear about it!

⭐ Star this repository
📝 Share your case study (link in Discussions)
🐦 Tweet with #ReliabilityLab
💼 Add to your LinkedIn portfolio
📸 Share screenshots in Discussions

📞 Getting Help

Template Issues: Open a GitHub Issue
Learning Questions: Use GitHub Discussions
Community: Join relevant Slack channels (CNCF, AWS)
AI Assistance: Use the setup-guide companion repo

⚖️ License

MIT License - see LICENSE

Free to use for learning, portfolio, or commercial purposes.

🙏 Acknowledgments

Inspired by:

Google SRE Book
AWS Well-Architected Framework
Kubernetes Best Practices
Production engineering experiences
Community feedback and contributions

🚀 Ready to Start?

Click "Use this template" above
Clone your new repository
Run the customization script
Start Week 1!

Your platform engineering journey begins now! 🎓

Questions? Check Discussions or open an Issue

Template Version: 1.0.0 Last Updated: 2025-11-01 Maintained by: YOUR-USERNAME

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
reliability-lab		reliability-lab
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🎓 Reliability Lab Template

🎯 What is This?

⭐ What You'll Build

🚀 Quick Start

1. Use This Template

2. Clone and Customize

3. Start Learning!

📚 What's Included

Complete Application Stack

8-Week Structured Learning Path

Production-Ready Features

🤖 Optional: AI-Assisted Learning

💡 Who Is This For?

Perfect If You:

You Should Know:

💰 Cost Breakdown

🛠️ Prerequisites

Required Accounts

Required Tools

📖 Documentation

🎯 Learning Outcomes

Technical Skills

Portfolio Assets

Interview Readiness

🌟 Features That Set This Apart

From Other Tutorials:

From Real Work:

🤝 Contributing

📣 Share Your Success

📞 Getting Help

⚖️ License

🙏 Acknowledgments

🚀 Ready to Start?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages