Skip to content

πŸš€ Complete hands-on course: Traditional Data β†’ Modern Engineering β†’ AI-Driven Systems. Interactive demos, GitHub Codespaces ready, Azure integration. Perfect for engineering graduates transitioning to AI careers.

License

Notifications You must be signed in to change notification settings

maree217/data-engineering-journey

Repository files navigation

Data Engineering Journey: From Traditional to AI-Driven Systems

Python Azure Codespaces GitHub stars GitHub last commit License: MIT

A comprehensive hands-on course for engineering graduates transitioning into AI and data engineering, using GitHub Codespaces, Azure, and the Microsoft ecosystem.

🎯 Course Overview

This course takes students through a practical journey from traditional data systems to modern AI-driven data engineering, with hands-on projects using real tools and platforms.

Target Audience

  • Engineering graduates with Python programming experience
  • Professionals looking to transition into AI and data engineering
  • Students wanting practical, hands-on experience with modern data tools

πŸ—οΈ Course Structure

Phase 1: Traditional Data Engineering (Weeks 1-4)

  • Foundation: Data fundamentals, types, and organizational needs
  • Traditional Stack: SQL databases, Python analytics, data warehousing
  • Business Intelligence: Power BI, visualization, reporting
  • Hands-on Project: Interactive data dashboard website

Phase 2: Modern Data Engineering (Weeks 5-8)

  • Cloud Architecture: Azure Data Factory, Synapse Analytics
  • MLOps Foundation: Azure Machine Learning, model deployment
  • Advanced Analytics: Predictive modeling, automated pipelines

Phase 3: AI-Driven Data Systems (Weeks 9-12)

  • Semantic Data: Vector databases, embeddings, Azure Cognitive Search
  • Graph Databases: Knowledge systems with Azure Cosmos DB
  • AI Agents: Automated analysis, memory systems, intelligent workflows

πŸ› οΈ Technology Stack

  • Development Environment: GitHub Codespaces
  • Cloud Platform: Microsoft Azure (free tier)
  • Databases: Azure SQL, PostgreSQL with pgvector, Cosmos DB
  • Analytics: Power BI, Azure Machine Learning, Azure Cognitive Search
  • AI Tools: Claude Code, GitHub Copilot
  • Languages: Python, SQL, JavaScript/HTML/CSS

πŸš€ Getting Started

Prerequisites

  • Basic Python programming knowledge
  • GitHub account
  • Azure account (free tier available)

Setup Instructions

  1. Fork this repository

    git clone https://github.com/your-username/data-ai-course.git
    cd data-ai-course
  2. Open in GitHub Codespaces

    • Click "Code" β†’ "Codespaces" β†’ "Create codespace on main"
    • Wait for environment setup to complete
  3. Start with Phase 1

    cd phase1-traditional/html-dashboard
    live-server --port=3000
  4. Access the Interactive Dashboard

    • Open the forwarded port 3000 in your browser
    • Begin your data engineering journey!

πŸ“š Course Modules

Phase 1 Modules

Week 1: Data Fundamentals

  • Interactive Dashboard: phase1-traditional/html-dashboard/
  • Learning Objectives:
    • Understand data types: structured, unstructured, semi-structured
    • Explore traditional storage systems and their evolution
    • Hands-on text analysis and basic data manipulation

Week 2: SQL Deep Dive

  • Project: Database design and advanced querying
  • Tools: Azure SQL Database, SQL Server Management Studio
  • Practice: Interactive SQL playground with real datasets

Week 3: Python Analytics

  • Project: Data analysis pipeline using pandas and numpy
  • Visualization: matplotlib, seaborn integration
  • Jupyter Labs: Interactive data exploration

Week 4: Business Intelligence

  • Project: Power BI dashboard connected to Azure SQL
  • Skills: DAX formulas, data modeling, report design
  • Integration: Automated refresh and sharing

πŸŽ“ Learning Approach

Hands-On First

  • Every concept introduced with a practical exercise
  • Real-world datasets and scenarios
  • Industry-standard tools and practices

Progressive Complexity

  • Start with familiar concepts (traditional databases)
  • Gradually introduce modern concepts (vector databases, AI agents)
  • Build comprehensive understanding through iteration

Industry Integration

  • Microsoft ecosystem focus (Azure, Power BI, etc.)
  • GitHub-based workflows
  • AI-assisted development with Claude Code and Copilot

πŸ“Š Sample Projects

Phase 1: Traditional Data Dashboard

  • Interactive website demonstrating data concepts
  • SQL query playground
  • Data visualization with Chart.js
  • Text analysis simulation

Phase 2: MLOps Pipeline (Coming Soon)

  • Azure ML pipeline creation
  • Model training and deployment
  • Automated data processing workflows

Phase 3: AI-Driven Analytics (Coming Soon)

  • Vector database implementation
  • AI agent for automated analysis
  • Graph database knowledge system

πŸ”§ Development Environment

The course uses a fully configured development environment with:

  • Python 3.11 with data science libraries
  • Node.js 18 for frontend development
  • Azure CLI for cloud integration
  • VS Code Extensions: Python, Jupyter, Azure tools, GitHub Copilot
  • Pre-configured Ports: 3000 (Dashboard), 8888 (Jupyter), 5432 (PostgreSQL)

πŸ“ˆ Learning Outcomes

By the end of this course, students will be able to:

  1. Design and implement traditional data systems

    • SQL databases, data warehouses, ETL pipelines
    • Business intelligence and reporting systems
  2. Build modern cloud-based data architectures

    • Azure data services integration
    • MLOps workflows and model deployment
  3. Create AI-driven data solutions

    • Vector databases and semantic search
    • AI agents for automated data analysis
    • Graph databases for knowledge management
  4. Use industry-standard tools effectively

    • GitHub workflows and collaboration
    • AI-assisted development practices
    • Cloud platform management

🀝 Contributing

This is an educational project. Contributions welcome:

  • Bug fixes and improvements
  • Additional exercises and examples
  • Documentation enhancements
  • New project ideas

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

  • Issues: Use GitHub Issues for technical problems
  • Discussions: Use GitHub Discussions for questions and ideas
  • Documentation: Check the docs/ folder for detailed guides

Ready to start your data engineering journey?

  1. Open GitHub Codespaces
  2. Navigate to phase1-traditional/html-dashboard/
  3. Run live-server --port=3000
  4. Open your browser and begin learning! πŸš€

About

πŸš€ Complete hands-on course: Traditional Data β†’ Modern Engineering β†’ AI-Driven Systems. Interactive demos, GitHub Codespaces ready, Azure integration. Perfect for engineering graduates transitioning to AI careers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published