Add scheduler components and installation instructions #1503

goodwillpunning · 2025-03-27T21:14:06Z

This commit adds a new scheduler component for executing a usage collection against a target Enterprise Data Warehouse (EDW).

The scheduler module contains:

A unit file and timer timer file for Linux hosts
A plist file for MacOS hosts
An entry point class (usage_collector.py) for instantiating the PipelineClass and running usage collection queries

sundarshankar89 · 2025-04-01T06:32:34Z

src/databricks/labs/remorph/assessments/scheduler/install/README.md

@@ -0,0 +1,24 @@
+# Installing the Usage Collection Scheduler
+


moving the markdown inside

/docs/remorph/docs/assessment folder will help for auto document generation

Great idea 👍

…pelineClass

sundarshankar89 · 2025-04-09T05:29:27Z

src/databricks/labs/remorph/assessments/scheduler/pipeline_scheduler.py

+    """A scheduler that executes Pipeline steps according to predefined schedules."""
+
+    def __init__(self, pipelines: list[PipelineClass],
+                 db_path: str = "pipeline_steps.duckdb",


instead of the path, can we pass a connection object?

or use a singleton for duckdb connection management and statemanagement.

Added a class, DuckDBManager to manage the connection to the DuckDB database. But perhaps we should move this to the connections module instead and even inherit from the DatabaseManager parent class. While it isn't necessarily a source database/data warehouse all of the abstract methods make sense. What do you think?

sundarshankar89 · 2025-04-09T12:58:36Z

src/databricks/labs/remorph/connections/database_manager.py


 from sqlalchemy import create_engine
 from sqlalchemy.engine import Engine, Result, URL
+from snowflake.sqlalchemy import URL as SnowflakeURL


I always go back and forth on this, using Snowflake Connector Lib vs. using vanilla pyodbc with Alchemy.
lesser library management and other advantages.

Should I pull this out of this PR for now? I only added it since I was adding a Postgres connector anyway.

Leave it as is, we will add it in Notice in a separate PR.

sundarshankar89 · 2025-04-17T11:51:08Z

@goodwillpunning can you recreate the PR direclty not from a fork, so our CI runs.

gueniai · 2025-04-25T03:40:17Z

@goodwillpunning can we close this one now?

goodwillpunning · 2025-04-25T14:59:38Z

Closing this in favor of #1520 and #1522.

Add scheduler components and install instructions

2ec9612

goodwillpunning added enhancement New feature or request do-not-merge labels Mar 27, 2025

goodwillpunning requested a review from a team as a code owner March 27, 2025 21:14

goodwillpunning had a problem deploying to tool March 27, 2025 21:14 — with GitHub Actions Failure

Add integration tests for usage collection scheduler

9704e3c

goodwillpunning had a problem deploying to tool March 31, 2025 18:28 — with GitHub Actions Failure

sundarshankar89 reviewed Apr 1, 2025

View reviewed changes

goodwillpunning had a problem deploying to tool April 1, 2025 06:34 — with GitHub Actions Failure

Add Postgres connector and mock creds

0c3c902

goodwillpunning had a problem deploying to tool April 1, 2025 15:03 — with GitHub Actions Failure

Move scheduler install instructions to docs folder

05913e1

goodwillpunning had a problem deploying to tool April 1, 2025 15:13 — with GitHub Actions Failure

Move system service files back under src directory

1d7c797

goodwillpunning had a problem deploying to tool April 1, 2025 15:22 — with GitHub Actions Failure

Add SnowflakeConnector

efaa86f

goodwillpunning had a problem deploying to tool April 1, 2025 21:05 — with GitHub Actions Failure

goodwillpunning had a problem deploying to tool April 2, 2025 03:59 — with GitHub Actions Failure

goodwillpunning had a problem deploying to tool April 2, 2025 04:06 — with GitHub Actions Failure

Add pipeline scheduler class and expose public execute function in Pi…

8f8f065

…pelineClass

goodwillpunning had a problem deploying to tool April 8, 2025 13:31 — with GitHub Actions Failure

Add retry for error status and move string literals into Enum classes

95bae1e

goodwillpunning had a problem deploying to tool April 8, 2025 17:12 — with GitHub Actions Failure

sundarshankar89 reviewed Apr 9, 2025

View reviewed changes

Add DuckDBManager to manage a single DuckDB connection

316092f

goodwillpunning had a problem deploying to tool April 9, 2025 16:09 — with GitHub Actions Failure

gueniai added the feat/profiler Issues related to profilers label Apr 15, 2025

sundarshankar89 changed the title ~~[WIP] Add scheduler components and installation instructions~~ Add scheduler components and installation instructions Apr 15, 2025

goodwillpunning closed this Apr 25, 2025

goodwillpunning deleted the feature/add_usage_collection_scheduler branch April 25, 2025 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add scheduler components and installation instructions #1503

Add scheduler components and installation instructions #1503

Uh oh!

goodwillpunning commented Mar 27, 2025

Uh oh!

sundarshankar89 Apr 1, 2025

Uh oh!

goodwillpunning Apr 1, 2025

Uh oh!

sundarshankar89 Apr 9, 2025

Uh oh!

sundarshankar89 Apr 9, 2025

Uh oh!

goodwillpunning Apr 9, 2025

Uh oh!

sundarshankar89 Apr 9, 2025

Uh oh!

goodwillpunning Apr 9, 2025

Uh oh!

sundarshankar89 Apr 17, 2025

Uh oh!

sundarshankar89 commented Apr 17, 2025

Uh oh!

gueniai commented Apr 25, 2025

Uh oh!

goodwillpunning commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add scheduler components and installation instructions #1503

Add scheduler components and installation instructions #1503

Uh oh!

Conversation

goodwillpunning commented Mar 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sundarshankar89 commented Apr 17, 2025

Uh oh!

gueniai commented Apr 25, 2025

Uh oh!

goodwillpunning commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants