This repository was archived by the owner on Mar 13, 2020. It is now read-only.
Releases: pageuppeople-opensource/data-pipeline-orchestrator
Releases · pageuppeople-opensource/data-pipeline-orchestrator
Add ability to accept initial execution id
The init-execution command now accepts an optional --execution-id parameter where users can provide a GUID themselves.
Start logging stats upon completion of execution and its steps
v0.1.5-beta Merge pull request #27 from PageUpPeopleOrg/feature/log-stats-upon-co…
Improve integration tests
Merges
- Run tests using a restricted test user instead of admin (#26)
Add ability to track execution steps and statistics
Add ability to track execution steps and statistics
- introduce a new entity
execution_stepto track each of the data pipeline execution's steps like LOAD, TRANSFORM, etc. - update type of
execution_time_msto store up to PostgreSQL 'BIGINT' type - update integration tests to cover all changes
Rename execution tracking tables
Merges [OSC-1302] Rename tables to match rdl (#22)
Rename to DPO, Integration tests and Alembic
Changes
Add Alembic
Rename MCD to DPO
- formerly "model-change-detector" now "data-pipeline-orchestrator"
Add Integrations Tests
Add coverage for all new commands in integration tests
- add coverage for all new commands in integration tests
- make integration tests re-runnable
- apply DRY to assist multiple execution iterations
- log passing tests post assertion
- move from plain Bourne shell to Bash shell since we now use a 3rd-party gist to generate UUIDs to allow us to re-run integration tests on dev machines
Refactor commands to support better state management
- renamed the below commands
initcommand toinit-executioncompletecommand tocomplete-execution- this now also calculates the overall execution time between
init-executionandcomplete-execution
- this now also calculates the overall execution time between
- added the below commands:
get-last-successful-execution: Finds the last successful data pipeline execution. Returns anexecution-idwhich is a GUID identifier of the new execution, if found; else returns and empty string.get-execution-last-updated-timestamp: Returns thelast-updated-onISO 8601 datetime with timezone of the givenexecution-id. Raises an error if givenexecution-idis invalid.
- split
comparecommand into:persist-models: Saves models of the givenmodel-typewithin the givenexecution-idby persisting hashed checksums of the given models.compare-models: Compares the hashed checksums of models between two executions. Returns comma-separated string of changed model names.- this now returns all models when all models have changed OR during first execution instead of the previous
*
- this now returns all models when all models have changed OR during first execution instead of the previous
Add new commands - 'compare' and 'complete' data pipeline execution
New commands:
compare: Compares & persists SHA256-hashed checksums of the given models against those of the last successful execution. Returns comma-separated string of changed model names. Parameters required:execution-id: a GUID identifier of an existing data pipeline execution as returned by theinitcommand.model-type: type of models being processed e.g.:load,transform, etc. thismodel-typeis used to group the model checksums by and used to find and compare older ones.base-path: absolute or relative path to the models e.g.:./load,/home/local/load,C:/path/to/loadmodel-patterns: path-based patterns (relative tobase-path) to different models with extensions. models within a model-type must be named uniquely regardless of their file extension. e.g.:*.txt,**/*.txt,./relative/path/to/some_models/**/*.csv,relative/path/to/some/more/related/models/**/*.sql
complete: Marks the completion of an existing execution by updating a record for the same in the given database. Returns nothing unless there's an error. Parameter required:execution-id: a GUID identifier of an existing data pipeline execution as returned by theinitcommand.
Support to start a new data pipeline execution
v0.0.1-alpha