Skip to content

Conversation

@goodwillpunning
Copy link
Contributor

Changes

What does this PR do?

Adds a new function to the common job deployer to install the local ingestion job. The job transforms profiler extracts into Unity Catalog–managed tables in the user’s local Databricks workspace, enabling the profiler summary (“local”) dashboards.

Relevant implementation details

  • The implementation closely follows the existing reconcile job deployment. Please verify that the install_state isn’t lost between create/update and save, especially if an exception is raised before the save.

Caveats/things to watch out for when reviewing:

Linked issues

This PR compliments PR#2000.

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • installed as a part of the CLI command (see PR#2000)

Tests

  • manually tested
  • added unit tests
  • added integration tests

)
],
"tasks": [
NotebookTask(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundarshankar89 's comment from PR#2000 "There are 2 ways we can implement this, have the ingestion job as python package and use a wheel task Or have the notebook upload and then run the jobs.I prefer option 1."

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

✅ 46/46 passed, 6 flaky, 3m13s total

Flaky tests:

  • 🤪 test_validate_non_empty_tables (24ms)
  • 🤪 test_transpiles_informatica_to_sparksql (14.369s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (15.781s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (3.611s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (19.249s)
  • 🤪 test_transpile_teradata_sql (11.75s)

Running from acceptance #2737

@codecov
Copy link

codecov bot commented Oct 22, 2025

Codecov Report

❌ Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 64.93%. Comparing base (3d54bc0) to head (b951b8b).

Files with missing lines Patch % Lines
.../labs/lakebridge/assessments/profiler_validator.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2101      +/-   ##
==========================================
+ Coverage   64.78%   64.93%   +0.15%     
==========================================
  Files          96       96              
  Lines        7891     7929      +38     
  Branches      820      822       +2     
==========================================
+ Hits         5112     5149      +37     
- Misses       2599     2600       +1     
  Partials      180      180              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

def _job_profiler_ingestion_task(self, task_key: str, description: str, lakebridge_wheel_path: str) -> Task:
libraries = [
compute.Library(whl=lakebridge_wheel_path),
compute.PythonPyPiLibrary(package="duckdb")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingestion job is dependent on duckdb library to read the profiler extract tables.


def main(*argv) -> None:
logger.debug(f"Arguments received: {argv}")
assert len(sys.argv) == 4, f"Invalid number of arguments: {len(sys.argv)}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Manually" testing this main() function outside of the wheel file, there appeared to be 3 additional arguments pertaining to the Python notebook session: 1) Interpreter, 2) -f flag, 3) env settings as a JSON file. Please review that the assumption that they will not be present in a wheel based job task is correct.


[project.entry-points.databricks]
reconcile = "databricks.labs.lakebridge.reconcile.execute:main"
dashboards = "databricks.labs.lakebridge.assessments.dashboards.execute:main"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be profiler_dashboards for clarity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/profiler Issues related to profilers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants