Skip to content

Use worst case method for MKI / KI#34

Merged
Damonamajor merged 42 commits into
mainfrom
33-investigate-decimal-place-stability
Sep 12, 2025
Merged

Use worst case method for MKI / KI#34
Damonamajor merged 42 commits into
mainfrom
33-investigate-decimal-place-stability

Conversation

@Damonamajor
Copy link
Copy Markdown
Contributor

@Damonamajor Damonamajor commented Sep 2, 2025

This uses a worst case scenario method for identifying MKI / KI values in order to ensure reproducibility.

@Damonamajor Damonamajor linked an issue Sep 2, 2025 that may be closed by this pull request
@Damonamajor Damonamajor changed the title 33 investigate decimal place stability Use worst case method for MKI / KI Sep 2, 2025
Comment thread assesspy/metrics.py
df.sort_values(by="sale_price", kind="mergesort", inplace=True)
df.sort_values(
by=["sale_price", "estimate"],
ascending=[True, False],
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses False for ascending order for estimate in accordance with our external guidance.

After a lot of deliberation, we decided the best way forward is to assume the "worst case scenario" in terms of MKI/KI metrics by sorting the data first by the ascending actual value (sale price) and then by the descending predicted value (modeled result). Not saying this has to be your solution, but wanted to share our thinking if helpful.

Comment thread assesspy/tests/test_metrics.py Outdated
@Damonamajor Damonamajor marked this pull request as ready for review September 2, 2025 21:11
@Damonamajor Damonamajor self-assigned this Sep 3, 2025
Copy link
Copy Markdown
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! Overall I think this looks right, just a few small nitpicks below related to the test definition.

Comment thread assesspy/tests/test_metrics.py Outdated
Comment thread assesspy/data/quintos_sample.csv Outdated
Comment thread assesspy/data/quintos_sample.csv
Comment thread assesspy/tests/test_metrics.py Outdated
Comment thread assesspy/tests/test_metrics.py Outdated
Comment on lines +85 to +91
@pt.mark.parametrize("metric", ["mki", "ki"])
def test_quintos_metric_matches_across_estimates(metric):
"""
For the quintos dataset, MKI/KI should be identical based
on the ordering of estimates.
"""
sample = ap.quintos_sample()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nitpick, optional] For consistency with other tests, I think it would make sense to switch to using a fixture here. Note how the unchanged tests above include the quintos_data fixture via a function parameter -- this is technically a fixture definition that inherits from the quintos_data fixture, but the principle is the same as if we were including the fixture in a test:

@pt.fixture
def metric_val(self, metric, ccao_data, quintos_data):
if metric in ["mki", "ki"]:
return getattr(ap, metric)(*quintos_data)
return getattr(ap, metric)(*ccao_data)

Here's the definition of the quintos_data fixture, which pytest loads automatically from conftest.py on startup so it can pass the fixture into any fixture or function that includes it in its function parameters:

@pt.fixture(scope="session")
def quintos_data() -> tuple:
sample = ap.quintos_sample()
return sample.estimate, sample.sale_price

If we follow my recommendation above to save the new data in a new sample file, we would need to define a new quintos_data_with_tiebreaks fixture in conftest.py and then include it here:

Suggested change
@pt.mark.parametrize("metric", ["mki", "ki"])
def test_quintos_metric_matches_across_estimates(metric):
"""
For the quintos dataset, MKI/KI should be identical based
on the ordering of estimates.
"""
sample = ap.quintos_sample()
@pt.mark.parametrize("metric", ["mki", "ki"])
def test_quintos_metric_matches_across_estimates(metric, quintos_data_with_tiebreaks):
"""
For the quintos dataset, MKI/KI should be identical based
on the ordering of estimates.
"""
sample = quintos_data_with_tiebreaks

The change should be pretty similar if we stick with quintos_data -- we just wouldn't need the extra fixture definition in confest.py in that case, since the quintos_data fixture already exists.

Damonamajor and others added 7 commits September 3, 2025 11:09
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
def test_mki_tiebreaks_consistent(
self, metric, quintos_data_with_tiebreaks
):
sale_price, estimate, estimate_alt_sort_1, estimate_alt_sort_2 = (
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can index, but this feels easier to interpret later.

Copy link
Copy Markdown
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version of the test/fixture looks great! Some suggestions below, mostly just tweaks to documentation to make the purpose of this change clearer.

Comment thread assesspy/load_data.py Outdated
Comment thread assesspy/load_data.py Outdated
Comment thread assesspy/metrics.py
df.sort_values(
by=["sale_price", "estimate"],
ascending=[True, False],
kind="mergesort",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question, optional] I wonder if this kwarg is still necessary? Per the pandas docs, kind is only used when sorting on a single column, but now we're sorting on two columns:

Choice of sorting algorithm. See also numpy.sort() for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.

I'm agnostic as to whether we should leave the kwarg in or take it out -- it doesn't seem to make anything worse to leave it in, and it could provide a layer of defensiveness preventing us from accidentally reintroducing an unstable sort if we ever decide to switch back to sorting on a single column -- but I'd be interested to see if the tests still pass when we take it out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that it wouldn't affect anything. The only reason I left it in was if we ever wanted to do something with the dataset externally, it would remain the same. Maybe we wanted to look at class once sorted by MKI. That's not really a good example, but I could imagine something along these lines.

I expect it to pass even without it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving it in!

Comment thread assesspy/metrics.py Outdated
Comment thread docs/source/quintos_sample_with_tiebreaks.rst Outdated
Damonamajor and others added 5 commits September 12, 2025 14:34
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
@Damonamajor Damonamajor merged commit 488cd1e into main Sep 12, 2025
8 checks passed
@Damonamajor Damonamajor deleted the 33-investigate-decimal-place-stability branch September 12, 2025 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure decimal place stability for MKI

3 participants