Skip to content

Conversation

treff7es
Copy link
Contributor

@treff7es treff7es commented Oct 8, 2025

No description provided.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 8, 2025
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

❌ Patch coverage is 81.25000% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...estion/src/datahub/ingestion/source/unity/usage.py 52.63% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

self.config.start_time, self.config.end_time
)
else:
raise ValueError(f"Unsupported usage_data_source: {usage_data_source}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this caught automatically by pydantic? If not - we could check it already in the validator.
Although this leaves us with the question - what to do in that else statement, maybe we could rewrite the check:

if SYSTEM_TABLES:

elif API

else: # assume AUTO (can add an assertion on that)

So to not mixup different layers of business logic (parameter validation and execution logic.



def test_usage_data_source_system_tables_requires_warehouse_id():
"""Test that usage_data_source=SYSTEM_TABLES requires warehouse_id."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should be adding such comments, do you find them helpful? Seems like they repeat what is already stated in the name of the test method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the completeness of those tests. Maybe we should consider packing them in a single class (not a strong opinion). Considering that the new default will be to use system tables now, I wonder why other tests are not affected, is it because we never set warehouse_id in the tests so system tables functions are not used?
Could we have one (or more) end-2-end test added to see logic in get_query_history_via_system_tables in action and somehow try to assess it's working (it is not fully possible of course, without having live Snowflake instance to query, just to the extent that unit tests allow us)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, we never set warehouse id

Comment on lines 444 to 474
try:
rows = self._execute_sql_query(query, (start_time, end_time))
for row in rows:
try:
yield Query(
query_id=row.statement_id if row.statement_id else None,
query_text=row.statement_text,
statement_type=(
QueryStatementType(row.statement_type)
if row.statement_type
else None
),
start_time=row.start_time,
end_time=row.end_time,
user_id=row.executed_by_user_id,
user_name=row.executed_by if row.executed_by else None,
executed_as_user_id=(
row.executed_as_user_id if row.executed_as_user_id else None
),
executed_as_user_name=(
row.executed_as if row.executed_as else None
),
)
except Exception as e:
logger.warning(f"Error parsing query from system table: {e}")
self.report.report_warning("query-parse-system-table", str(e))
except Exception as e:
logger.error(
f"Error fetching query history from system tables: {e}", exc_info=True
)
self.report.report_warning("get-query-history-system-tables", str(e))
Copy link
Contributor

@sgomezvillamor sgomezvillamor Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments here:

  • let's use proper structured report with title, message context, ...
  • I miss some metrics about number of results or execution time of the query... unless that's already done upstream in the call stack
  • the recurrent pattern x if x else None... isn't that redundant and innecessary?

logger.error(
f"Error fetching query history from system tables: {e}", exc_info=True
)
self.report.report_warning("get-query-history-system-tables", str(e))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider using failure here, not a strong opinion though.

Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with some minor comments

What I really miss is some test on the new feature: fetching from system tables, how are you planning to cover that?

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Oct 8, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Oct 8, 2025
Comment on lines 471 to 473
title="Failed to fetch query history from system tables",
message=f"Error querying system.query.history table: {e}",
context=f"Query period: {start_time} to {end_time}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

title and message are used for grouping logs

Suggested change
title="Failed to fetch query history from system tables",
message=f"Error querying system.query.history table: {e}",
context=f"Query period: {start_time} to {end_time}",
title="Failed to fetch query history from system tables",
message=f"Error querying system.query.history table",
context=f"Query period: {start_time} to {end_time}",
exc=e,

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Oct 13, 2025
@github-actions github-actions bot requested a deployment to datahub-wheels (Preview) October 13, 2025 13:33 Abandoned
@treff7es treff7es merged commit 18d3eec into master Oct 15, 2025
71 of 73 checks passed
@treff7es treff7es deleted the unity_usage_sql branch October 15, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants