-
Notifications
You must be signed in to change notification settings - Fork 77
Add scheduler components and installation instructions #1503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scheduler components and installation instructions #1503
Conversation
| @@ -0,0 +1,24 @@ | |||
| # Installing the Usage Collection Scheduler | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving the markdown inside
/docs/remorph/docs/assessment folder will help for auto document generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea 👍
| """A scheduler that executes Pipeline steps according to predefined schedules.""" | ||
|
|
||
| def __init__(self, pipelines: list[PipelineClass], | ||
| db_path: str = "pipeline_steps.duckdb", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of the path, can we pass a connection object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or use a singleton for duckdb connection management and statemanagement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a class, DuckDBManager to manage the connection to the DuckDB database. But perhaps we should move this to the connections module instead and even inherit from the DatabaseManager parent class. While it isn't necessarily a source database/data warehouse all of the abstract methods make sense. What do you think?
|
|
||
| from sqlalchemy import create_engine | ||
| from sqlalchemy.engine import Engine, Result, URL | ||
| from snowflake.sqlalchemy import URL as SnowflakeURL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always go back and forth on this, using Snowflake Connector Lib vs. using vanilla pyodbc with Alchemy.
lesser library management and other advantages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I pull this out of this PR for now? I only added it since I was adding a Postgres connector anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave it as is, we will add it in Notice in a separate PR.
|
@goodwillpunning can you recreate the PR direclty not from a fork, so our CI runs. |
|
@goodwillpunning can we close this one now? |
This commit adds a new scheduler component for executing a usage collection against a target Enterprise Data Warehouse (EDW).
The scheduler module contains:
usage_collector.py) for instantiating thePipelineClassand running usage collection queries