Skip to content

Commit 15abb7c

Browse files
authored
Merge pull request #202 from databrickslabs/feature/v0.0.10
Feature/v0.0.10
2 parents ee9a895 + 387abbb commit 15abb7c

File tree

111 files changed

+9976
-817
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+9976
-817
lines changed

.coveragerc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[run]
22
branch = True
3-
command_line = -m unittest
3+
command_line = -m pytest tests/
44
include = src/*.py
55
omit =
66
*/site-packages/*

.github/workflows/onpush.yml

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ jobs:
1818
steps:
1919
- uses: actions/checkout@v1
2020

21+
- name: Set up Java
22+
uses: actions/setup-java@v3
23+
with:
24+
distribution: 'temurin'
25+
java-version: '11'
26+
2127
- name: Set up Python ${{ matrix.python-version }}
2228
uses: actions/setup-python@v4
2329
with:
@@ -40,25 +46,20 @@ jobs:
4046
- name: Lint
4147
run: flake8
4248

43-
- name: set spark local
44-
run: export SPARK_LOCAL_IP=127.0.0.1
45-
46-
- name: set spark executor memory
47-
run: export SPARK_EXECUTOR_MEMORY=8g
48-
49-
- name: set spark driver memory
50-
run: export SPARK_DRIVER_MEMORY=8g
51-
52-
- name: set javaopts
53-
run: export JAVA_OPTS="-Xmx10g -XX:+UseG1GC"
49+
- name: Set environment variables
50+
run: |
51+
echo "SPARK_LOCAL_IP=127.0.0.1" >> $GITHUB_ENV
52+
echo "SPARK_EXECUTOR_MEMORY=8g" >> $GITHUB_ENV
53+
echo "SPARK_DRIVER_MEMORY=8g" >> $GITHUB_ENV
54+
echo "JAVA_OPTS=-Xmx10g -XX:+UseG1GC" >> $GITHUB_ENV
5455
5556
- name: Print System Information
5657
run: |
5758
python -c "import psutil; import os;
5859
print(f'Physical Memory: {psutil.virtual_memory().total / 1e9:.2f} GB'); print(f'CPU Cores: {os.cpu_count()}')"
5960
6061
- name: Run Unit Tests
61-
run: python -m coverage run
62+
run: python -m coverage run -m pytest tests/ -v
6263

6364
- name: Publish test coverage
6465
if: startsWith(matrix.os,'ubuntu')

.gitignore

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,4 @@ demo/conf/onboarding.json
156156
integration_tests/conf/onboarding*.json
157157
demo/conf/onboarding*.json
158158
integration_test_output*.csv
159-
databricks.yml
160-
oboarding_job_details.json
161-
159+
onboarding_job_details.json

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,25 @@
11
# Changelog
2+
3+
## [v0.0.10]
4+
### Added
5+
- Added apply_changes_from_snapshot support in silver layer [PR](https://github.com/databrickslabs/dlt-meta/pull/187)
6+
- Added UI using databricks lakehouse app for onboarding/deploy commands [PR](https://github.com/databrickslabs/dlt-meta/pull/168)
7+
- Added support for non-Delta as sinks(delta, kafka) [PR](https://github.com/databrickslabs/dlt-meta/pull/157)
8+
- Added quarantine support in silver layer for data quality rules [PR](https://github.com/databrickslabs/dlt-meta/pull/191)
9+
- Added support for table comments, column comments, and cluster_by [PR](https://github.com/databrickslabs/dlt-meta/pull/91)
10+
- Added catalog support for sourceDetails and targetDetails [PR](https://github.com/databrickslabs/dlt-meta/issues/173)
11+
- Added DBDemos for dlt-meta [PR](https://github.com/databrickslabs/dlt-meta/issues/183)
12+
- Added YAML support for onboarding [PR](https://github.com/databrickslabs/dlt-meta/issues/184)
13+
- Fixed issue cluster by not working with bronze append only table [PR](https://github.com/databrickslabs/dlt-meta/issues/197)
14+
- Fixed issue view name containing period when using DPM [PR](https://github.com/databrickslabs/dlt-meta/issues/169)
15+
- Fixed issue CLI onboarding overwrite option always set to True [PR](https://github.com/databrickslabs/dlt-meta/issues/163)
16+
- Fixed issue Silver DLT not creating based on passed database [PR](https://github.com/databrickslabs/dlt-meta/issues/160)
17+
- Fixed issue PyPI download stats display [PR](https://github.com/databrickslabs/dlt-meta/issues/200)
18+
- Fixed issue Silver Data Quality not working [PR](https://github.com/databrickslabs/dlt-meta/issues/156)
19+
- Fixed issue Removed DPM flag check inside dataflowpipeline [PR](https://github.com/databrickslabs/dlt-meta/issues/177)
20+
- Fixed issue Updated dlt-meta demos into Delta Live Tables Notebook github [PR](https://github.com/databrickslabs/dlt-meta/issues/158)
21+
22+
223
## [v.0.0.9]
324
- Added apply_changes_from_snapshot api support in bronze layer: [PR](https://github.com/databrickslabs/dlt-meta/pull/124)
425
- Added dlt append_flow api support for silver layer: [PR](https://github.com/databrickslabs/dlt-meta/pull/63)

README.md

Lines changed: 35 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -11,38 +11,15 @@
1111

1212
---
1313

14-
<p align="left">
15-
<a href="https://databrickslabs.github.io/dlt-meta/">
16-
<img src="https://img.shields.io/badge/DOCS-PASSING-green?style=for-the-badge" alt="Documentation Status"/>
17-
</a>
18-
<a href="https://pypi.org/project/dlt-meta/">
19-
<img src="https://img.shields.io/badge/PYPI-v%200.0.9-green?style=for-the-badge" alt="Latest Python Release"/>
20-
</a>
21-
<a href="https://github.com/databrickslabs/dlt-meta/actions/workflows/onpush.yml">
22-
<img src="https://img.shields.io/github/workflow/status/databrickslabs/dlt-meta/build/main?style=for-the-badge"
23-
alt="GitHub Workflow Status (branch)"/>
24-
</a>
25-
<a href="https://codecov.io/gh/databrickslabs/dlt-meta">
26-
<img src="https://img.shields.io/codecov/c/github/databrickslabs/dlt-meta?style=for-the-badge&amp;token=2CxLj3YBam"
27-
alt="codecov"/>
28-
</a>
29-
<a href="https://pypistats.org/packages/dl-meta">
30-
<img src="https://img.shields.io/pypi/dm/dlt-meta?style=for-the-badge" alt="downloads"/>
31-
</a>
32-
<a href="https://github.com/PyCQA/flake8">
33-
<img src="https://img.shields.io/badge/FLAKE8-FLAKE8-lightgrey?style=for-the-badge"
34-
alt="We use flake8 for formatting"/>
35-
</a>
36-
</p>
37-
38-
[![lines of code](https://tokei.rs/b1/github/databrickslabs/dlt-meta)](<[https://codecov.io/github/databrickslabs/dlt-meta](https://github.com/databrickslabs/dlt-meta)>)
14+
[![Documentation](https://img.shields.io/badge/docs-passing-green)](https://databrickslabs.github.io/dlt-meta/) [![PyPI](https://img.shields.io/badge/pypi-v0.0.9-green)](https://pypi.org/project/dlt-meta/) [![Build](https://img.shields.io/github/workflow/status/databrickslabs/dlt-meta/build/main)](https://github.com/databrickslabs/dlt-meta/actions/workflows/onpush.yml) [![Coverage](https://img.shields.io/codecov/c/github/databrickslabs/dlt-meta)](https://codecov.io/gh/databrickslabs/dlt-meta) [![Style](https://img.shields.io/badge/code%20style-flake8-blue)](https://github.com/PyCQA/flake8) [![PyPI Downloads](https://static.pepy.tech/badge/dlt-meta/month)](https://pepy.tech/projects/dlt-meta)
3915

4016
---
4117

18+
4219
# Project Overview
43-
`DLT-META` is a metadata-driven framework designed to work with [Delta Live Tables](https://www.databricks.com/product/delta-live-tables). This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.
20+
`DLT-META` is a metadata-driven framework designed to work with [Lakeflow Declarative Pipelines](https://www.databricks.com/product/data-engineering/lakeflow-declarative-pipelines). This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.
4421

45-
In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow
22+
In practice, a single generic pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow
4623

4724
### Components:
4825

@@ -82,6 +59,8 @@ In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to
8259
| Liquid cluster support | Bronze, Bronze Quarantine, Silver tables|
8360
| [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/) | ```databricks labs dlt-meta onboard```, ```databricks labs dlt-meta deploy``` |
8461
| Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with ```layer=bronze_silver``` option using Direct publishing mode |
62+
| [DLT Sinks](https://docs.databricks.com/aws/en/delta-live-tables/dlt-sinks) |Supported formats:external ```delta table```, ```kafka```.Bronze, Silver layers|
63+
| [Databricks Asset Bundles](https://docs.databricks.com/aws/en/dev-tools/bundles/) | Supported
8564

8665
## Getting Started
8766

@@ -121,36 +100,47 @@ databricks auth login --host WORKSPACE_HOST
121100

122101
If you want to run existing demo files please follow these steps before running onboard command:
123102

124-
```commandline
103+
1. Clone dlt-meta:
104+
```commandline
125105
git clone https://github.com/databrickslabs/dlt-meta.git
126-
```
106+
```
127107
128-
```commandline
108+
2. Navigate to project directory:
109+
```commandline
129110
cd dlt-meta
130-
```
111+
```
131112
132-
```commandline
113+
3. Create Python virtual environment:
114+
```commandline
133115
python -m venv .venv
134-
```
116+
```
135117
136-
```commandline
118+
4. Activate virtual environment:
119+
```commandline
137120
source .venv/bin/activate
138-
```
121+
```
139122
140-
```commandline
141-
pip install databricks-sdk
142-
```
123+
5. Install required packages:
124+
```commandline
125+
# Core requirements
126+
pip install "PyYAML>=6.0" setuptools databricks-sdk
127+
128+
# Development requirements
129+
pip install delta-spark==3.0.0 pyspark==3.5.5 pytest>=7.0.0 coverage>=7.0.0
130+
131+
# Integration test requirements
132+
pip install "typer[all]==0.6.1"
133+
```
143134
144-
```commandline
135+
6. Set environment variables:
136+
```commandline
145137
dlt_meta_home=$(pwd)
146-
```
147-
148-
```commandline
149138
export PYTHONPATH=$dlt_meta_home
150-
```
151-
```commandline
139+
```
140+
7. Run onboarding command:
141+
```commandline
152142
databricks labs dlt-meta onboard
153-
```
143+
```
154144
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)
155145
156146

0 commit comments

Comments
 (0)