Skip to content

Conversation

@jb3rndt
Copy link
Collaborator

@jb3rndt jb3rndt commented Oct 19, 2025

This PR is based on #3, but I'm opening this already to get your feedback :)

Adds three new metrics:

  • correctness: compares each data point with its ground truth using a distance function (either simple absolute difference for numbers or levenshtein distance for strings)
  • currency: given a decline rate per column, the name of the column that contains the assessment date of each value in the tuple, and optionally a simulated assessment date to not rely on "now", calculates the currency based on this formula: curr(w, A) = exp(-decline(A) * age(w,A)) (with w the attribute value and A the column)
    • the decline rate is interpreted in years right now. It might be useful to make that configurable too?
  • rule-based consistency: checks whether the given rules per column hold on an attribute. Weighing a rule happens inside the rule definition itself. The return value of all rules are just added up when assessing the consistency value.
    • since rules are defined as python functions right now, I allowed metrics to be initialized by passing a config object directly (keeping JSON as an option too of course)

@Copilot Copilot AI review requested due to automatic review settings October 19, 2025 19:14
@jb3rndt jb3rndt force-pushed the feat/correctness-metric branch from 3b1f382 to 9f9bc44 Compare October 19, 2025 19:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces three new data quality metrics (Correctness, Currency, and Rule-based Consistency), refactors writer implementations to use a shared SQLAlchemy-based DatabaseWriter, and adds a SQLAlchemy ORM model for persisting results.

  • New metrics: Correctness (distance vs. ground truth), Currency (exponential decay by age), Rule-based Consistency (rule aggregation with certainty).
  • Refactor: Unify SQLite/Postgres writers via DatabaseWriter and SQLAlchemy ORM models; add DQDimension enum and update DQResult to use it.
  • Config handling: Add MetricConfig base and load_config utility; allow passing config objects (notably for rule-based consistency).

Reviewed Changes

Copilot reviewed 18 out of 21 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
metis/writer/sqlite_writer.py Switch SQLiteWriter to SQLAlchemy via DatabaseWriter; provide engine factory.
metis/writer/postgres_writer.py Switch PostgresWriter to SQLAlchemy via DatabaseWriter; provide engine factory.
metis/writer/database_writer.py New base writer using SQLAlchemy ORM models; centralizes table creation and writes.
metis/writer/console_writer.py Minor typing tweak for optional config.
metis/utils/result.py DQdimension type changed to DQDimension enum.
metis/utils/dq_dimension.py New DQDimension StrEnum with dimensions.
metis/models.py New SQLAlchemy declarative model and dynamic table registration.
metis/metric/metric.py Add MetricConfig support and config loader.
metis/metric/currency/currency.py Implement Currency metric with exponential decay by age.
metis/metric/currency/config.py Config dataclass for Currency.
metis/metric/correctness/correctness.py Implement Correctness metric (distance-based).
metis/metric/consistency/rule_consistency.py Implement rule-based consistency with certainty annotation.
metis/metric/consistency/consistency.py Make Consistency accept JSON config path; switch to DQDimension.
metis/metric/consistency/config.py Config dataclasses for consistency metrics.
metis/metric/config.py Base config dataclass helper.
metis/metric/completeness/completeness.py Switch to DQDimension and updated typing.
metis/metric/init.py Export new metrics.
metis/dq_orchestrator.py Allow MetricConfig object in orchestrator assess API.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants