A comprehensive hands-on course for engineering graduates transitioning into AI and data engineering, using GitHub Codespaces, Azure, and the Microsoft ecosystem.
This course takes students through a practical journey from traditional data systems to modern AI-driven data engineering, with hands-on projects using real tools and platforms.
- Engineering graduates with Python programming experience
- Professionals looking to transition into AI and data engineering
- Students wanting practical, hands-on experience with modern data tools
- Foundation: Data fundamentals, types, and organizational needs
- Traditional Stack: SQL databases, Python analytics, data warehousing
- Business Intelligence: Power BI, visualization, reporting
- Hands-on Project: Interactive data dashboard website
- Cloud Architecture: Azure Data Factory, Synapse Analytics
- MLOps Foundation: Azure Machine Learning, model deployment
- Advanced Analytics: Predictive modeling, automated pipelines
- Semantic Data: Vector databases, embeddings, Azure Cognitive Search
- Graph Databases: Knowledge systems with Azure Cosmos DB
- AI Agents: Automated analysis, memory systems, intelligent workflows
- Development Environment: GitHub Codespaces
- Cloud Platform: Microsoft Azure (free tier)
- Databases: Azure SQL, PostgreSQL with pgvector, Cosmos DB
- Analytics: Power BI, Azure Machine Learning, Azure Cognitive Search
- AI Tools: Claude Code, GitHub Copilot
- Languages: Python, SQL, JavaScript/HTML/CSS
- Basic Python programming knowledge
- GitHub account
- Azure account (free tier available)
-
Fork this repository
git clone https://github.com/your-username/data-ai-course.git cd data-ai-course -
Open in GitHub Codespaces
- Click "Code" β "Codespaces" β "Create codespace on main"
- Wait for environment setup to complete
-
Start with Phase 1
cd phase1-traditional/html-dashboard live-server --port=3000 -
Access the Interactive Dashboard
- Open the forwarded port 3000 in your browser
- Begin your data engineering journey!
- Interactive Dashboard: phase1-traditional/html-dashboard/
- Learning Objectives:
- Understand data types: structured, unstructured, semi-structured
- Explore traditional storage systems and their evolution
- Hands-on text analysis and basic data manipulation
- Project: Database design and advanced querying
- Tools: Azure SQL Database, SQL Server Management Studio
- Practice: Interactive SQL playground with real datasets
- Project: Data analysis pipeline using pandas and numpy
- Visualization: matplotlib, seaborn integration
- Jupyter Labs: Interactive data exploration
- Project: Power BI dashboard connected to Azure SQL
- Skills: DAX formulas, data modeling, report design
- Integration: Automated refresh and sharing
- Every concept introduced with a practical exercise
- Real-world datasets and scenarios
- Industry-standard tools and practices
- Start with familiar concepts (traditional databases)
- Gradually introduce modern concepts (vector databases, AI agents)
- Build comprehensive understanding through iteration
- Microsoft ecosystem focus (Azure, Power BI, etc.)
- GitHub-based workflows
- AI-assisted development with Claude Code and Copilot
- Interactive website demonstrating data concepts
- SQL query playground
- Data visualization with Chart.js
- Text analysis simulation
- Azure ML pipeline creation
- Model training and deployment
- Automated data processing workflows
- Vector database implementation
- AI agent for automated analysis
- Graph database knowledge system
The course uses a fully configured development environment with:
- Python 3.11 with data science libraries
- Node.js 18 for frontend development
- Azure CLI for cloud integration
- VS Code Extensions: Python, Jupyter, Azure tools, GitHub Copilot
- Pre-configured Ports: 3000 (Dashboard), 8888 (Jupyter), 5432 (PostgreSQL)
By the end of this course, students will be able to:
-
Design and implement traditional data systems
- SQL databases, data warehouses, ETL pipelines
- Business intelligence and reporting systems
-
Build modern cloud-based data architectures
- Azure data services integration
- MLOps workflows and model deployment
-
Create AI-driven data solutions
- Vector databases and semantic search
- AI agents for automated data analysis
- Graph databases for knowledge management
-
Use industry-standard tools effectively
- GitHub workflows and collaboration
- AI-assisted development practices
- Cloud platform management
This is an educational project. Contributions welcome:
- Bug fixes and improvements
- Additional exercises and examples
- Documentation enhancements
- New project ideas
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: Use GitHub Issues for technical problems
- Discussions: Use GitHub Discussions for questions and ideas
- Documentation: Check the docs/ folder for detailed guides
Ready to start your data engineering journey?
- Open GitHub Codespaces
- Navigate to
phase1-traditional/html-dashboard/ - Run
live-server --port=3000 - Open your browser and begin learning! π