feat(ingest/unity): Use sql to extract query history for usage #14953

treff7es · 2025-10-08T12:55:18Z

No description provided.

codecov · 2025-10-08T12:58:13Z

Codecov Report

❌ Patch coverage is 81.25000% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...estion/src/datahub/ingestion/source/unity/usage.py	52.63%	9 Missing ⚠️

📢 Thoughts on this report? Let us know!

skrydal · 2025-10-08T13:31:55Z

metadata-ingestion/src/datahub/ingestion/source/unity/usage.py

+                    self.config.start_time, self.config.end_time
+                )
+            else:
+                raise ValueError(f"Unsupported usage_data_source: {usage_data_source}")


Isn't this caught automatically by pydantic? If not - we could check it already in the validator.
Although this leaves us with the question - what to do in that else statement, maybe we could rewrite the check:

if SYSTEM_TABLES: elif API else: # assume AUTO (can add an assertion on that)

So to not mixup different layers of business logic (parameter validation and execution logic.

metadata-ingestion/src/datahub/ingestion/source/unity/config.py

skrydal · 2025-10-08T13:33:18Z

metadata-ingestion/tests/unit/test_unity_catalog_config.py

+
+
+def test_usage_data_source_system_tables_requires_warehouse_id():
+    """Test that usage_data_source=SYSTEM_TABLES requires warehouse_id."""


I wonder whether we should be adding such comments, do you find them helpful? Seems like they repeat what is already stated in the name of the test method.

skrydal · 2025-10-08T13:36:14Z

metadata-ingestion/tests/unit/test_unity_catalog_config.py

I like the completeness of those tests. Maybe we should consider packing them in a single class (not a strong opinion). Considering that the new default will be to use system tables now, I wonder why other tests are not affected, is it because we never set warehouse_id in the tests so system tables functions are not used?
Could we have one (or more) end-2-end test added to see logic in get_query_history_via_system_tables in action and somehow try to assess it's working (it is not fully possible of course, without having live Snowflake instance to query, just to the extent that unit tests allow us)?

exactly, we never set warehouse id

sgomezvillamor · 2025-10-08T13:36:24Z

metadata-ingestion/src/datahub/ingestion/source/unity/proxy.py

+        try:
+            rows = self._execute_sql_query(query, (start_time, end_time))
+            for row in rows:
+                try:
+                    yield Query(
+                        query_id=row.statement_id if row.statement_id else None,
+                        query_text=row.statement_text,
+                        statement_type=(
+                            QueryStatementType(row.statement_type)
+                            if row.statement_type
+                            else None
+                        ),
+                        start_time=row.start_time,
+                        end_time=row.end_time,
+                        user_id=row.executed_by_user_id,
+                        user_name=row.executed_by if row.executed_by else None,
+                        executed_as_user_id=(
+                            row.executed_as_user_id if row.executed_as_user_id else None
+                        ),
+                        executed_as_user_name=(
+                            row.executed_as if row.executed_as else None
+                        ),
+                    )
+                except Exception as e:
+                    logger.warning(f"Error parsing query from system table: {e}")
+                    self.report.report_warning("query-parse-system-table", str(e))
+        except Exception as e:
+            logger.error(
+                f"Error fetching query history from system tables: {e}", exc_info=True
+            )
+            self.report.report_warning("get-query-history-system-tables", str(e))


some comments here:

let's use proper structured report with title, message context, ...

I miss some metrics about number of results or execution time of the query... unless that's already done upstream in the call stack

the recurrent pattern x if x else None... isn't that redundant and innecessary?

skrydal · 2025-10-08T13:37:33Z

metadata-ingestion/src/datahub/ingestion/source/unity/proxy.py

+            logger.error(
+                f"Error fetching query history from system tables: {e}", exc_info=True
+            )
+            self.report.report_warning("get-query-history-system-tables", str(e))


I would consider using failure here, not a strong opinion though.

sgomezvillamor

Overall LGTM with some minor comments

What I really miss is some test on the new feature: fetching from system tables, how are you planning to cover that?

sgomezvillamor · 2025-10-13T12:27:03Z

metadata-ingestion/src/datahub/ingestion/source/unity/proxy.py

+                title="Failed to fetch query history from system tables",
+                message=f"Error querying system.query.history table: {e}",
+                context=f"Query period: {start_time} to {end_time}",


title and message are used for grouping logs

Suggested change

title="Failed to fetch query history from system tables",

message=f"Error querying system.query.history table: {e}",

context=f"Query period: {start_time} to {end_time}",

title="Failed to fetch query history from system tables",

message=f"Error querying system.query.history table",

context=f"Query period: {start_time} to {end_time}",

exc=e,

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 8, 2025

github-actions bot deployed to datahub-wheels (Preview) October 8, 2025 12:57 View deployment

datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Oct 8, 2025

vercel bot deployed to Preview October 8, 2025 13:11 View deployment

github-actions bot deployed to datahub-wheels (Preview) October 8, 2025 13:16 View deployment

vercel bot deployed to Preview October 8, 2025 13:31 View deployment

skrydal reviewed Oct 8, 2025

View reviewed changes

sgomezvillamor reviewed Oct 8, 2025

View reviewed changes

metadata-ingestion/src/datahub/ingestion/source/unity/config.py Show resolved Hide resolved

skrydal reviewed Oct 8, 2025

View reviewed changes

sgomezvillamor reviewed Oct 8, 2025

View reviewed changes

skrydal reviewed Oct 8, 2025

View reviewed changes

sgomezvillamor reviewed Oct 8, 2025

View reviewed changes

datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Oct 8, 2025

github-actions bot deployed to datahub-wheels (Preview) October 8, 2025 13:44 View deployment

datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Oct 8, 2025

vercel bot deployed to Preview October 8, 2025 13:59 View deployment

treff7es added 3 commits October 13, 2025 11:04

Use sql to extract usage like lineage

bfe7455

Fix indent

ec44984

Lint fix

9f5ad21

treff7es force-pushed the unity_usage_sql branch from d185b34 to 9f5ad21 Compare October 13, 2025 09:04

github-actions bot deployed to datahub-wheels (Preview) October 13, 2025 09:06 View deployment

vercel bot deployed to Preview October 13, 2025 09:30 View deployment

Fix tests

cde4319

github-actions bot deployed to datahub-wheels (Preview) October 13, 2025 10:18 View deployment

vercel bot deployed to Preview October 13, 2025 10:33 View deployment

Address pr review comments

576420a

github-actions bot deployed to datahub-wheels (Preview) October 13, 2025 10:48 View deployment

vercel bot deployed to Preview October 13, 2025 11:04 View deployment

treff7es requested review from sgomezvillamor and skrydal October 13, 2025 11:29

sgomezvillamor reviewed Oct 13, 2025

View reviewed changes

sgomezvillamor approved these changes Oct 13, 2025

View reviewed changes

datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Oct 13, 2025

Fix report error

78cba07

github-actions bot deployed to datahub-wheels (Preview) October 13, 2025 12:38 View deployment

vercel bot deployed to Preview October 13, 2025 13:02 View deployment

Add user agent to sql queries

f3eff0d

github-actions bot deployed to datahub-wheels (Preview) October 13, 2025 13:26 View deployment

Fix linter issues

4c5abb8

github-actions bot requested a deployment to datahub-wheels (Preview) October 13, 2025 13:33 Abandoned

vercel bot deployed to Preview October 13, 2025 13:58 View deployment

github-actions bot deployed to datahub-wheels (Preview) October 14, 2025 11:52 View deployment

treff7es merged commit 18d3eec into master Oct 15, 2025
71 of 73 checks passed

treff7es deleted the unity_usage_sql branch October 15, 2025 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ingest/unity): Use sql to extract query history for usage #14953

feat(ingest/unity): Use sql to extract query history for usage #14953

Uh oh!

treff7es commented Oct 8, 2025

Uh oh!

codecov bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

skrydal Oct 8, 2025

Uh oh!

Uh oh!

skrydal Oct 8, 2025

Uh oh!

skrydal Oct 8, 2025

Uh oh!

treff7es Oct 8, 2025

Uh oh!

sgomezvillamor Oct 8, 2025 •

edited

Loading

Uh oh!

skrydal Oct 8, 2025

Uh oh!

sgomezvillamor left a comment

Uh oh!

sgomezvillamor Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def test_usage_data_source_system_tables_requires_warehouse_id():
		"""Test that usage_data_source=SYSTEM_TABLES requires warehouse_id."""

feat(ingest/unity): Use sql to extract query history for usage #14953

feat(ingest/unity): Use sql to extract query history for usage #14953

Uh oh!

Conversation

treff7es commented Oct 8, 2025

Uh oh!

codecov bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

skrydal Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skrydal Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

skrydal Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

treff7es Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

sgomezvillamor Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skrydal Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

sgomezvillamor left a comment

Choose a reason for hiding this comment

Uh oh!

sgomezvillamor Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 8, 2025 •

edited

Loading

sgomezvillamor Oct 8, 2025 •

edited

Loading