This project demonstrates various data transformations using Azure services. The main objective is to showcase how to build a scalable and efficient data pipeline using Azure Data Factory, Azure Databricks, and Azure Synapse Analytics. This has two logical app built and also all the activities in Azure Data Factory is implemented. The motivation behind this project is to provide a comprehensive guide for data engineers to leverage Azure's capabilities for data processing and analytics.
Before you begin, ensure you have met the following requirements:
- You have an Azure subscription. If you don't have an Azure account, you can create one here.
-
Clone the repository:
git clone https://github.com/pavithra19/DataEngineeringProject.git cd DataEngineeringProject -
Set up Azure services:
Azure SQL Database Azure Data Lake Storage Azure Data Factory
-
Import the ARM templates:
Import the ARM template code and Run the pipelines
The project uses the following software and libraries:
- Azure Data Factory: For orchestrating data workflows.
- Azure Data Lake Storage: For storage account.
- Azure SQL Database: For Database.
- Azure Databricks: For scalable data processing and machine learning.
- Azure Synapse Analytics: For data warehousing and big data analytics.
- Python: For scripting and data manipulation.
- Visual Studio Code: For code editing and debugging.
Release notes and updates can be found in the releases section.
Refer to the following documentation for detailed API references:
-
Deploy the Azure resources using the provided ARM templates:
az deployment group create --resource-group <your-resource-group> --template-file azuredeploy.json
-
Configure your Azure services by editing the
config.jsonfile with your Azure resource details.
Run CI/CD pipelines at the Azure DevOps Git to ensure the data pipeline works end-to-end.
Contributions are welcome! Here’s how you can help:
- Fork the repository.
- Create a new branch (
git checkout -b feature/your-feature). - Make your changes and commit them (
git commit -m 'Add some feature'). - Push to the branch (
git push origin feature/your-feature). - Create a pull request.