Azure Data Engineering Project

Introduction

This project demonstrates various data transformations using Azure services. The main objective is to showcase how to build a scalable and efficient data pipeline using Azure Data Factory, Azure Databricks, and Azure Synapse Analytics. This has two logical app built and also all the activities in Azure Data Factory is implemented. The motivation behind this project is to provide a comprehensive guide for data engineers to leverage Azure's capabilities for data processing and analytics.

Getting Started

Prerequisites

Before you begin, ensure you have met the following requirements:

You have an Azure subscription. If you don't have an Azure account, you can create one here.

Installation Process

Clone the repository:

git clone https://github.com/pavithra19/DataEngineeringProject.git
cd DataEngineeringProject

Set up Azure services:

Azure SQL Database
Azure Data Lake Storage
Azure Data Factory

Import the ARM templates:

Import the ARM template code and Run the pipelines

Software Dependencies

The project uses the following software and libraries:

Azure Data Factory: For orchestrating data workflows.
Azure Data Lake Storage: For storage account.
Azure SQL Database: For Database.
Azure Databricks: For scalable data processing and machine learning.
Azure Synapse Analytics: For data warehousing and big data analytics.
Python: For scripting and data manipulation.
Visual Studio Code: For code editing and debugging.

Latest Releases

Release notes and updates can be found in the releases section.

API References

Refer to the following documentation for detailed API references:

Build and Test

Building the Project

Deploy the Azure resources using the provided ARM templates:

az deployment group create --resource-group <your-resource-group> --template-file azuredeploy.json

Configure your Azure services by editing the config.json file with your Azure resource details.

Running Tests

Run CI/CD pipelines at the Azure DevOps Git to ensure the data pipeline works end-to-end.

Contribute

Contributions are welcome! Here’s how you can help:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature).
Make your changes and commit them (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature).
Create a pull request.

Acknowledgements

Azure Data Engineering Guide

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
dataflow		dataflow
dataset		dataset
factory		factory
integrationRuntime		integrationRuntime
linkedService		linkedService
pipeline		pipeline
trigger		trigger
README.md		README.md
publish_config.json		publish_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Azure Data Engineering Project

Introduction

Getting Started

Prerequisites

Installation Process

Software Dependencies

Latest Releases

API References

Build and Test

Building the Project

Running Tests

Contribute

Acknowledgements

About

Uh oh!

Releases

Packages

pavithra19/DataEngineeringProject

Folders and files

Latest commit

History

Repository files navigation

Azure Data Engineering Project

Introduction

Getting Started

Prerequisites

Installation Process

Software Dependencies

Latest Releases

API References

Build and Test

Building the Project

Running Tests

Contribute

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages