Skip to content

The project showcases how to build a scalable and efficient data pipeline using Azure Data Factory, Azure Databricks, and Azure Synapse Analytics. This has two logical app built and also all the available activities of Azure Data Factory is implemented.

Notifications You must be signed in to change notification settings

pavithra19/DataEngineeringProject

Repository files navigation

Azure Data Engineering Project

Introduction

This project demonstrates various data transformations using Azure services. The main objective is to showcase how to build a scalable and efficient data pipeline using Azure Data Factory, Azure Databricks, and Azure Synapse Analytics. This has two logical app built and also all the activities in Azure Data Factory is implemented. The motivation behind this project is to provide a comprehensive guide for data engineers to leverage Azure's capabilities for data processing and analytics.

Getting Started

Prerequisites

Before you begin, ensure you have met the following requirements:

  • You have an Azure subscription. If you don't have an Azure account, you can create one here.

Installation Process

  1. Clone the repository:

    git clone https://github.com/pavithra19/DataEngineeringProject.git
    cd DataEngineeringProject
  2. Set up Azure services:

    Azure SQL Database
    Azure Data Lake Storage
    Azure Data Factory
  3. Import the ARM templates:

    Import the ARM template code and Run the pipelines

Software Dependencies

The project uses the following software and libraries:

  • Azure Data Factory: For orchestrating data workflows.
  • Azure Data Lake Storage: For storage account.
  • Azure SQL Database: For Database.
  • Azure Databricks: For scalable data processing and machine learning.
  • Azure Synapse Analytics: For data warehousing and big data analytics.
  • Python: For scripting and data manipulation.
  • Visual Studio Code: For code editing and debugging.

Latest Releases

Release notes and updates can be found in the releases section.

API References

Refer to the following documentation for detailed API references:

Build and Test

Building the Project

  1. Deploy the Azure resources using the provided ARM templates:

    az deployment group create --resource-group <your-resource-group> --template-file azuredeploy.json
  2. Configure your Azure services by editing the config.json file with your Azure resource details.

Running Tests

Run CI/CD pipelines at the Azure DevOps Git to ensure the data pipeline works end-to-end.

Contribute

Contributions are welcome! Here’s how you can help:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature).
  3. Make your changes and commit them (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature/your-feature).
  5. Create a pull request.

Acknowledgements

About

The project showcases how to build a scalable and efficient data pipeline using Azure Data Factory, Azure Databricks, and Azure Synapse Analytics. This has two logical app built and also all the available activities of Azure Data Factory is implemented.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published