Skip to content

SteveLeve/reliability-lab-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ Reliability Lab Template

A GitHub Template for learning Platform Engineering & SRE practices with AWS EKS

License: MIT GitHub Template AWS Terraform Kubernetes


🎯 What is This?

This is a production-ready GitHub template for a structured 6-8 week learning journey through modern platform engineering and Site Reliability Engineering (SRE) practices.

Perfect for:

  • πŸ’Ό Building a portfolio project to showcase platform engineering skills
  • πŸ“š Learning AWS EKS, Kubernetes, and Infrastructure as Code
  • πŸŽ“ Transitioning into Platform Engineering or SRE roles
  • πŸš€ Understanding production-grade reliability patterns
  • πŸ’‘ Hands-on experience with real cloud infrastructure

⭐ What You'll Build

By the end of this 8-week journey, you'll have:

βœ… Production-grade EKS cluster running on AWS βœ… Containerized microservice with FastAPI βœ… Full CI/CD pipeline with GitHub Actions βœ… Infrastructure as Code using Terraform βœ… Comprehensive observability with CloudWatch βœ… Chaos engineering experiments documented βœ… Portfolio-ready case study to showcase βœ… Real production experience to discuss in interviews

πŸš€ Quick Start

1. Use This Template

Click the green "Use this template" button at the top of this page, then:

  1. Select "Create a new repository"
  2. Name your repo (e.g., my-reliability-lab)
  3. Choose Public visibility (for portfolio)
  4. Click "Create repository from template"

2. Clone and Customize

# Clone your new repository
git clone git@github.com:YOUR-USERNAME/my-reliability-lab.git
cd my-reliability-lab

# Navigate to the lab directory
cd reliability-lab

# Run customization script
chmod +x .github/template-cleanup.sh
./.github/template-cleanup.sh YOUR-USERNAME my-reliability-lab

# Commit customizations
git add .
git commit -m "chore: customize template"
git push

3. Start Learning!

# Install dependencies
python3 -m venv venv
source venv/bin/activate
make dev

# Run tests
make test

# Start the app
cd deploy/compose
docker-compose up

Then open docs/lab-roadmap.md and begin Week 1!

πŸ“š What's Included

Complete Application Stack

reliability-lab/
β”œβ”€β”€ services/sample-app/          # FastAPI microservice
β”‚   β”œβ”€β”€ app/main.py              # Application code
β”‚   β”œβ”€β”€ tests/                   # Test suite
β”‚   └── Dockerfile               # Multi-stage build
β”œβ”€β”€ deploy/
β”‚   β”œβ”€β”€ compose/                 # Local development
β”‚   β”œβ”€β”€ k8s/                     # Kubernetes manifests
β”‚   └── terraform/               # Infrastructure as Code
β”œβ”€β”€ ops/
β”‚   β”œβ”€β”€ observability/           # Monitoring configs
β”‚   └── security/                # Security baselines
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ architecture/adr/        # Architecture decisions
β”‚   β”œβ”€β”€ lab-roadmap/             # Week-by-week guides
β”‚   β”œβ”€β”€ runbooks/                # Operational procedures
β”‚   └── reliability/             # SRE concepts
└── .github/workflows/           # CI/CD pipelines

8-Week Structured Learning Path

Week Focus Time Outcome
1 EKS & Containerization 10-15h App running on EKS
2 Terraform & AWS Baseline 10-15h IaC foundation
3 Kubernetes Hardening 10-15h Production patterns
4 CI/CD Pipeline 10-15h Automated deployments
5 Observability 10-15h Metrics, logs, SLOs
6 Chaos Engineering 10-15h Resilience testing
7-8 Polish & Portfolio 15-20h Case study complete

Total: 75-110 hours over 6-8 weeks

Production-Ready Features

βœ… Security First

  • Non-root containers
  • Trivy security scanning
  • IAM least-privilege patterns
  • No secrets in code (IRSA + Secrets Manager)

βœ… Cost Optimized

  • Designed for AWS Free Tier where possible
  • Cost-conscious architecture decisions
  • ~$90-180 total budget (with optimization)
  • Guidance on scaling to $0 between sessions

βœ… Best Practices

  • Multi-stage Docker builds
  • Kubernetes health probes
  • Infrastructure as Code
  • Comprehensive testing
  • Pre-commit hooks

βœ… Documentation

  • Architecture Decision Records (ADRs)
  • Operational runbooks
  • SRE concepts and glossary
  • Portfolio-ready case study template

πŸ€– Optional: AI-Assisted Learning

For an enhanced learning experience, use this template with the companion Reliability Lab Setup Guide repository.

The setup guide provides:

  • Claude AI configuration as your personal SRE mentor
  • Step-by-step onboarding instructions
  • Weekly coaching content
  • Learning reinforcement exercises
  • Community support

With AI assistance, you get:

  • Real-time explanations of complex concepts
  • Troubleshooting help when stuck
  • Best practices recommendations
  • Cost optimization guidance
  • Portfolio-building advice

πŸ’‘ Who Is This For?

Perfect If You:

  • 🎯 Are transitioning into Platform Engineering or SRE roles
  • πŸ“š Learn best by building real projects
  • πŸ’Ό Need portfolio projects to showcase
  • πŸš€ Want hands-on AWS and Kubernetes experience
  • πŸ’° Have a budget of $100-200 for learning
  • ⏰ Can dedicate 10-15 hours per week for 6-8 weeks

You Should Know:

  • βœ… Basic programming (Python helpful but not required)
  • βœ… Git and GitHub fundamentals
  • βœ… Command line basics
  • βœ… Basic cloud concepts (AWS account setup)
  • ⚠️ No Kubernetes expertise needed - you'll learn it!
  • ⚠️ No Terraform expertise needed - you'll learn it!

πŸ’° Cost Breakdown

Expected Costs:

  • Week 1: $10-20 (EKS cluster exploration)
  • Weeks 2-6: $15-20/week (active development)
  • Weeks 7-8: $5-10 (polish & documentation)
  • Total: $90-180 (6-8 weeks)

Optimization Tips:

  • Scale EKS nodes to 0 when not working
  • Delete LoadBalancers between sessions
  • Use CloudWatch (free tier) instead of paid tools
  • Destroy infrastructure between weeks if taking breaks

Ultra-budget option: ~$30/month by running only on weekends

πŸ› οΈ Prerequisites

Required Accounts

  • AWS account (personal tier is fine)
  • GitHub account
  • Claude.ai account (optional, for AI assistance)

Required Tools

  • Docker Desktop
  • AWS CLI v2
  • kubectl
  • Terraform
  • Python 3.11+
  • Git
  • Make

Verify prerequisites:

docker --version
aws --version
kubectl version --client
terraform --version
python3 --version

πŸ“– Documentation

🎯 Learning Outcomes

After completing this lab, you'll be able to:

Technical Skills

  • βœ… Deploy and manage AWS EKS clusters
  • βœ… Write Infrastructure as Code with Terraform
  • βœ… Build production-ready containerized applications
  • βœ… Implement CI/CD pipelines with GitHub Actions
  • βœ… Configure observability and monitoring
  • βœ… Apply Kubernetes production patterns (HPA, PDB, IRSA)
  • βœ… Conduct chaos engineering experiments
  • βœ… Implement security best practices

Portfolio Assets

  • βœ… Public GitHub repository showcasing skills
  • βœ… Architecture diagrams and documentation
  • βœ… Technical case study writeup
  • βœ… Evidence of production-grade work
  • βœ… Demonstrable problem-solving ability

Interview Readiness

  • βœ… Real infrastructure experience to discuss
  • βœ… Understanding of SRE principles
  • βœ… Cost and security awareness
  • βœ… Production incident handling experience
  • βœ… Clear articulation of technical decisions

🌟 Features That Set This Apart

From Other Tutorials:

  • βœ… Production-grade, not toy examples
  • βœ… Cost-conscious throughout
  • βœ… Security-first approach
  • βœ… Portfolio-ready outcomes
  • βœ… Complete infrastructure, not just snippets

From Real Work:

  • βœ… Structured learning path with clear milestones
  • βœ… Explanations included - understand the "why"
  • βœ… Guided troubleshooting - learn to debug
  • βœ… Time-boxed - complete in 6-8 weeks
  • βœ… Budget-friendly - under $200 total

🀝 Contributing

Found an issue? Have an improvement?

  1. Fork this repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a Pull Request

All contributions welcome - from typo fixes to new features!

πŸ“£ Share Your Success

Completed the lab? We'd love to hear about it!

  • ⭐ Star this repository
  • πŸ“ Share your case study (link in Discussions)
  • 🐦 Tweet with #ReliabilityLab
  • πŸ’Ό Add to your LinkedIn portfolio
  • πŸ“Έ Share screenshots in Discussions

πŸ“ž Getting Help

  • Template Issues: Open a GitHub Issue
  • Learning Questions: Use GitHub Discussions
  • Community: Join relevant Slack channels (CNCF, AWS)
  • AI Assistance: Use the setup-guide companion repo

βš–οΈ License

MIT License - see LICENSE

Free to use for learning, portfolio, or commercial purposes.

πŸ™ Acknowledgments

Inspired by:

  • Google SRE Book
  • AWS Well-Architected Framework
  • Kubernetes Best Practices
  • Production engineering experiences
  • Community feedback and contributions

πŸš€ Ready to Start?

  1. Click "Use this template" above
  2. Clone your new repository
  3. Run the customization script
  4. Start Week 1!

Your platform engineering journey begins now! πŸŽ“


Questions? Check Discussions or open an Issue

Template Version: 1.0.0 Last Updated: 2025-11-01 Maintained by: YOUR-USERNAME

About

πŸŽ“ A GitHub Template for learning Platform Engineering & SRE practices with AWS EKS

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors