Skip to content

Commit 7fd18ae

Browse files
Merge pull request #218 from databrickslabs/feature/v0.0.10
Feature/v0.0.10
2 parents 15abb7c + d87900f commit 7fd18ae

28 files changed

+730
-225
lines changed

README.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,13 @@ In practice, a single generic pipeline reads the Dataflowspec and uses it to orc
2929
- Capture [Data Quality Rules](https://github.com/databrickslabs/dlt-meta/tree/main/examples/dqe/customers/bronze_data_quality_expectations.json)
3030
- Capture processing logic as sql in [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json)
3131

32-
#### Generic DLT pipeline
32+
#### Generic Lakeflow Declarative Pipeline
3333

3434
- Apply appropriate readers based on input metadata
3535
- Apply data quality rules with DLT expectations
3636
- Apply CDC apply changes if specified in metadata
37-
- Builds DLT graph based on input/output metadata
38-
- Launch DLT pipeline
37+
- Builds Lakeflow Declarative Pipeline graph based on input/output metadata
38+
- Launch Lakeflow Declarative Pipeline pipeline
3939

4040
## High-Level Process Flow:
4141

@@ -53,14 +53,15 @@ In practice, a single generic pipeline reads the Dataflowspec and uses it to orc
5353
| Custom transformations | Bronze, Silver layer accepts custom functions|
5454
| Data Quality Expecations Support | Bronze, Silver layer |
5555
| Quarantine table support | Bronze layer |
56-
| [apply_changes](https://docs.databricks.com/en/delta-live-tables/python-ref.html#cdc) API support | Bronze, Silver layer |
57-
| [apply_changes_from_snapshot](https://docs.databricks.com/en/delta-live-tables/python-ref.html#change-data-capture-from-database-snapshots-with-python-in-delta-live-tables) API support | Bronze layer|
56+
| [create_auto_cdc_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes) API support | Bronze, Silver layer |
57+
| [create_auto_cdc_from_snapshot_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes-from-snapshot) API support | Bronze layer|
5858
| [append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#use-append-flow-to-write-to-a-streaming-table-from-multiple-source-streams) API support | Bronze layer|
5959
| Liquid cluster support | Bronze, Bronze Quarantine, Silver tables|
6060
| [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/) | ```databricks labs dlt-meta onboard```, ```databricks labs dlt-meta deploy``` |
6161
| Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with ```layer=bronze_silver``` option using Direct publishing mode |
62-
| [DLT Sinks](https://docs.databricks.com/aws/en/delta-live-tables/dlt-sinks) |Supported formats:external ```delta table```, ```kafka```.Bronze, Silver layers|
62+
| [create_sink](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-sink) API support |Supported formats:```external delta table , kafka``` Bronze, Silver layers|
6363
| [Databricks Asset Bundles](https://docs.databricks.com/aws/en/dev-tools/bundles/) | Supported
64+
| [DLT-META UI](https://github.com/databrickslabs/dlt-meta/tree/main/lakehouse_app#dlt-meta-lakehouse-app-setup) | Uses Databricks Lakehouse DLT-META App
6465

6566
## Getting Started
6667

@@ -137,38 +138,37 @@ If you want to run existing demo files please follow these steps before running
137138
dlt_meta_home=$(pwd)
138139
export PYTHONPATH=$dlt_meta_home
139140
```
141+
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)
142+
143+
140144
7. Run onboarding command:
141145
```commandline
142146
databricks labs dlt-meta onboard
143147
```
144-
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)
145-
146148
147-
Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
149+
The command will prompt you to provide onboarding details. If you have cloned the dlt-meta repository, you can accept the default values which will use the configuration from the demo folder.
148150
![onboardingDLTMeta_2.gif](docs/static/images/onboardingDLTMeta_2.gif)
149151
150-
151-
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
152+
Above onboard cli command will:
153+
1. Push code and data to your Databricks workspace
154+
2. Create an onboarding job
155+
3. Display a success message: ```Job created successfully. job_id={job_id}, url=https://{databricks workspace url}/jobs/{job_id}```
156+
4. Job URL will automatically open in your default browser.
152157
153158
### depoly using dlt-meta CLI:
154159
155-
- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command
160+
- Once onboarding jobs is finished deploy Lakeflow Declarative Pipeline using below command
156161
- ```commandline
157162
databricks labs dlt-meta deploy
158163
```
159-
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
160-
- - Bronze DLT
161-
162-
![deployingDLTMeta_bronze.gif](docs/static/images/deployingDLTMeta_bronze.gif)
164+
The command will prompt you to provide pipeline configuration details.
163165

166+
![deployingDLTMeta_bronze_silver.gif](docs/static/images/deployingDLTMeta_bronze_silver.gif)
164167

165-
- Silver DLT
166-
- - ```commandline
167-
databricks labs dlt-meta deploy
168-
```
169-
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
170-
171-
![deployingDLTMeta_silver.gif](docs/static/images/deployingDLTMeta_silver.gif)
168+
Above deploy cli command will:
169+
1. Deploy Lakeflow Declarative pipeline with dlt-meta configuration like ```layer```, ```group```, ```dataflowSpec table details``` etc to your databricks workspace
170+
2. Display message: ```dlt-meta pipeline={pipeline_id} created and launched with update_id={pipeline_update_id}, url=https://{databricks workspace url}/#joblist/pipelines/{pipeline_id}```
171+
3. Pipline URL will automatically open in your defaul browser.
172172

173173

174174
## More questions

docs/content/app/_index.md

Lines changed: 0 additions & 38 deletions
This file was deleted.

docs/content/demo/Append_FLOW_CF.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,26 @@ This demo will perform following tasks:
2121
databricks auth login --host WORKSPACE_HOST
2222
```
2323
24-
3. ```commandline
24+
3. Install Python package requirements:
25+
```commandline
26+
# Core requirements
27+
pip install "PyYAML>=6.0" setuptools databricks-sdk
28+
29+
# Development requirements
30+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
31+
```
32+
33+
4. Clone dlt-meta:
34+
```commandline
2535
git clone https://github.com/databrickslabs/dlt-meta.git
2636
```
2737
28-
4. ```commandline
38+
5. Navigate to project directory:
39+
```commandline
2940
cd dlt-meta
3041
```
3142
32-
5. Set python environment variable into terminal
43+
6. Set python environment variable into terminal
3344
```commandline
3445
dlt_meta_home=$(pwd)
3546
```
@@ -38,7 +49,8 @@ This demo will perform following tasks:
3849
export PYTHONPATH=$dlt_meta_home
3950
```
4051
41-
6. ```commandline
52+
7. Run the command:
53+
```commandline
4254
python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=dlt_meta_uc
4355
```
4456

docs/content/demo/Append_FLOW_EH.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,32 @@ draft: false
1818
databricks auth login --host WORKSPACE_HOST
1919
```
2020
21-
3. ```commandline
21+
3. Install Python package requirements:
22+
```commandline
23+
# Core requirements
24+
pip install "PyYAML>=6.0" setuptools databricks-sdk
25+
26+
# Development requirements
27+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
28+
```
29+
30+
4. Clone dlt-meta:
31+
```commandline
2232
git clone https://github.com/databrickslabs/dlt-meta.git
2333
```
2434
25-
4. ```commandline
35+
5. Navigate to project directory:
36+
```commandline
2637
cd dlt-meta
2738
```
28-
5. Set python environment variable into terminal
39+
6. Set python environment variable into terminal
2940
```commandline
3041
dlt_meta_home=$(pwd)
3142
```
3243
```commandline
3344
export PYTHONPATH=$dlt_meta_home
3445
```
35-
6. Eventhub
46+
7. Configure Eventhub
3647
- Needs eventhub instance running
3748
- Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow)
3849
- Create databricks secrets scope for eventhub keys
@@ -61,7 +72,8 @@ draft: false
6172
- eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds
6273
- eventhub_port: Eventhub port
6374
64-
7. ```commandline
75+
8. Run the command:
76+
```commandline
6577
python demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --uc_catalog_name=dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey
6678
```
6779

docs/content/demo/Apply_Changes_From_Snapshot.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,21 +26,33 @@ draft: false
2626
databricks auth login --host WORKSPACE_HOST
2727
```
2828
29-
3. ```commandline
29+
3. Install Python package requirements:
30+
```commandline
31+
# Core requirements
32+
pip install "PyYAML>=6.0" setuptools databricks-sdk
33+
34+
# Development requirements
35+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
36+
```
37+
38+
4. Clone dlt-meta:
39+
```commandline
3040
git clone https://github.com/databrickslabs/dlt-meta.git
3141
```
3242
33-
4. ```commandline
43+
5. Navigate to project directory:
44+
```commandline
3445
cd dlt-meta
3546
```
36-
5. Set python environment variable into terminal
47+
6. Set python environment variable into terminal
3748
```commandline
3849
dlt_meta_home=$(pwd)
3950
```
4051
```commandline
4152
export PYTHONPATH=$dlt_meta_home
4253
43-
6. ```commandline
54+
7. Run the command:
55+
```commandline
4456
python demo/launch_acfs_demo.py --uc_catalog_name=<<uc catalog name>>
4557
```
4658
- uc_catalog_name : Unity catalog name

docs/content/demo/DAB.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
title: "DAB Demo"
3+
date: 2024-02-26T14:25:26-04:00
4+
weight: 28
5+
draft: false
6+
---
7+
8+
### DAB Demo
9+
10+
## Overview
11+
This demo showcases how to use Databricks Asset Bundles (DABs) with DLT-Meta:
12+
13+
This demo will perform following steps:
14+
- Create dlt-meta schema's for dataflowspec and bronze/silver layer
15+
- Upload necessary resources to unity catalog volume
16+
- Create DAB files with catalog, schema, file locations populated
17+
- Deploy DAB to databricks workspace
18+
- Run onboarding using DAB commands
19+
- Run Bronze/Silver Pipelines using DAB commands
20+
- Demo examples will showcase fan-out pattern in silver layer
21+
- Demo example will show case custom transformations for bronze/silver layers
22+
- Adding custom columns and metadata to Bronze tables
23+
- Implementing SCD Type 1 to Silver tables
24+
- Applying expectations to filter data in Silver tables
25+
26+
### Steps:
27+
1. Launch Command Prompt
28+
29+
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
30+
- Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:
31+
32+
```commandline
33+
databricks auth login --host WORKSPACE_HOST
34+
```
35+
36+
3. Install Python package requirements:
37+
```commandline
38+
# Core requirements
39+
pip install "PyYAML>=6.0" setuptools databricks-sdk
40+
41+
# Development requirements
42+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
43+
```
44+
45+
4. Clone dlt-meta:
46+
```commandline
47+
git clone https://github.com/databrickslabs/dlt-meta.git
48+
```
49+
50+
5. Navigate to project directory:
51+
```commandline
52+
cd dlt-meta
53+
```
54+
55+
6. Set python environment variable into terminal:
56+
```commandline
57+
dlt_meta_home=$(pwd)
58+
export PYTHONPATH=$dlt_meta_home
59+
```
60+
61+
7. Generate DAB resources and set up schemas:
62+
This command will:
63+
- Generate DAB configuration files
64+
- Create DLT-Meta schemas
65+
- Upload necessary files to volumes
66+
```commandline
67+
python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name=<your_catalog_name> --profile=<your_profile>
68+
```
69+
> Note: If you don't specify `--profile`, you'll be prompted for your Databricks workspace URL and access token.
70+
71+
8. Deploy and run the DAB bundle:
72+
- Navigate to the DAB directory:
73+
```commandline
74+
cd demo/dabs
75+
```
76+
77+
- Validate the bundle configuration:
78+
```commandline
79+
databricks bundle validate --profile=<your_profile>
80+
```
81+
82+
- Deploy the bundle to dev environment:
83+
```commandline
84+
databricks bundle deploy --target dev --profile=<your_profile>
85+
```
86+
87+
- Run the onboarding job:
88+
```commandline
89+
databricks bundle run onboard_people -t dev --profile=<your_profile>
90+
```
91+
92+
- Execute the pipelines:
93+
```commandline
94+
databricks bundle run execute_pipelines_people -t dev --profile=<your_profile>
95+
```
96+
97+
![dab_onboarding_job.png](/images/dab_onboarding_job.png)
98+
![dab_dlt_pipelines.png](/images/dab_dlt_pipelines.png)

docs/content/demo/DAIS.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,35 @@ This demo showcases DLT-META's capabilities of creating Bronze and Silver DLT pi
2323
databricks auth login --host WORKSPACE_HOST
2424
```
2525
26-
3. ```commandline
26+
3. Install Python package requirements:
27+
```commandline
28+
# Core requirements
29+
pip install "PyYAML>=6.0" setuptools databricks-sdk
30+
31+
# Development requirements
32+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
33+
```
34+
35+
4. Clone dlt-meta:
36+
```commandline
2737
git clone https://github.com/databrickslabs/dlt-meta.git
2838
```
2939
30-
4. ```commandline
40+
5. Navigate to project directory:
41+
```commandline
3142
cd dlt-meta
3243
```
3344
34-
5. Set python environment variable into terminal
45+
6. Set python environment variable into terminal
3546
```commandline
3647
dlt_meta_home=$(pwd)
3748
```
3849
```commandline
3950
export PYTHONPATH=$dlt_meta_home
4051
```
4152
42-
6. ```commandline
53+
7. Run the command:
54+
```commandline
4355
python demo/launch_dais_demo.py --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=<<>>
4456
```
4557
- uc_catalog_name : unit catalog name

0 commit comments

Comments
 (0)