Skip to content

Conversation

@goodwillpunning
Copy link
Contributor

This commit adds a new scheduler component for executing a usage collection against a target Enterprise Data Warehouse (EDW).

The scheduler module contains:

  • A unit file and timer timer file for Linux hosts
  • A plist file for MacOS hosts
  • An entry point class (usage_collector.py) for instantiating the PipelineClass and running usage collection queries

@@ -0,0 +1,24 @@
# Installing the Usage Collection Scheduler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving the markdown inside

/docs/remorph/docs/assessment folder will help for auto document generation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea 👍

"""A scheduler that executes Pipeline steps according to predefined schedules."""

def __init__(self, pipelines: list[PipelineClass],
db_path: str = "pipeline_steps.duckdb",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the path, can we pass a connection object?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or use a singleton for duckdb connection management and statemanagement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a class, DuckDBManager to manage the connection to the DuckDB database. But perhaps we should move this to the connections module instead and even inherit from the DatabaseManager parent class. While it isn't necessarily a source database/data warehouse all of the abstract methods make sense. What do you think?


from sqlalchemy import create_engine
from sqlalchemy.engine import Engine, Result, URL
from snowflake.sqlalchemy import URL as SnowflakeURL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always go back and forth on this, using Snowflake Connector Lib vs. using vanilla pyodbc with Alchemy.
lesser library management and other advantages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I pull this out of this PR for now? I only added it since I was adding a Postgres connector anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it as is, we will add it in Notice in a separate PR.

@gueniai gueniai added the feat/profiler Issues related to profilers label Apr 15, 2025
@sundarshankar89 sundarshankar89 changed the title [WIP] Add scheduler components and installation instructions Add scheduler components and installation instructions Apr 15, 2025
@sundarshankar89
Copy link
Collaborator

@goodwillpunning can you recreate the PR direclty not from a fork, so our CI runs.

@gueniai
Copy link
Collaborator

gueniai commented Apr 25, 2025

@goodwillpunning can we close this one now?

@goodwillpunning
Copy link
Contributor Author

Closing this in favor of #1520 and #1522.

@goodwillpunning goodwillpunning deleted the feature/add_usage_collection_scheduler branch April 25, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge enhancement New feature or request feat/profiler Issues related to profilers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants