-
Notifications
You must be signed in to change notification settings - Fork 0
Use worst case method for MKI / KI #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7d048ed
bf585fd
c60b5fb
8278bac
9de2cb9
20451f2
e97c413
d83c541
d8bd9cf
7e518e0
3432b95
0fe2336
6908bb2
50082a9
d491933
8477716
e49fb16
4fb55e9
f5be991
fe5b445
e67f915
34e0b14
e7d25ac
1c8918c
0c5692b
3fcc03b
455085d
aa7dd08
c18b8ec
a3b96a2
31dc6eb
6574b2a
db9823d
53f52a0
24a07de
bde3a77
132d803
5861e4b
7ab91cb
cb51fc4
8bcdf41
8d4d995
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,4 +28,4 @@ estimate,sale_price | |
| 192959,235000 | ||
| 180046,250000 | ||
| 200240,279000 | ||
| 211445,295000 | ||
| 211445,295000 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| sale_price,estimate,estimate_alt_sort_1,estimate_alt_sort_2 | ||
| 32900,37299.37125,37299.37125,37299.37125 | ||
| 36000,40165.89269,40165.89269,40165.89269 | ||
| 54000,56317.4201,56317.4201,56317.4201 | ||
| 64500,66183.77244,66183.77244,66183.77244 | ||
| 68000,69486.97316,69486.97316,69486.97316 | ||
| 70000,71514.52586,71514.52586,71514.52586 | ||
| 74000,75338.28603,75338.28603,75338.28603 | ||
| 80000,81035.95111,81035.95111,81035.95111 | ||
| 84900,85672.85577,85672.85577,85672.85577 | ||
| 89000,85021.0865,94088.93683,90046.33945 | ||
| 89000,90046.33945,85021.0865,94088.93683 | ||
| 89000,94088.93683,90046.33945,85021.0865 | ||
| 105900,100227.0936,100227.0936,100227.0936 | ||
| 109000,103156.7516,103156.7516,103156.7516 | ||
| 115000,108290.1277,108290.1277,108290.1277 | ||
| 124500,117098.7563,117098.7563,117098.7563 | ||
| 129900,115346.9796,115346.9796,115346.9796 | ||
| 135000,119678.4223,119678.4223,119678.4223 | ||
| 149000,131630.9478,131630.9478,131630.9478 | ||
| 155800,137321.2061,137321.2061,137321.2061 | ||
| 163500,143973.5639,143973.5639,143973.5639 | ||
| 175000,153571.8563,153571.8563,153571.8563 | ||
| 179000,148456.8866,148456.8866,148456.8866 | ||
| 185600,153488.3876,153488.3876,153488.3876 | ||
| 199900,165039.8271,165039.8271,165039.8271 | ||
| 215000,176939.5763,176939.5763,176939.5763 | ||
| 235000,192959.3127,192959.3127,192959.3127 | ||
| 250000,180046.1193,180046.1193,180046.1193 | ||
| 279000,200240.2442,200240.2442,200240.2442 | ||
| 295000,211445.4891,211445.4891,211445.4891 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -211,8 +211,21 @@ def _calculate_gini( | |
| .reset_index(drop=True) | ||
| ) | ||
| df = pd.concat([estimate, sale_price], axis=1) | ||
| # Mergesort is required for stable sort results | ||
| df.sort_values(by="sale_price", kind="mergesort", inplace=True) | ||
| # This Gini coefficient algorithm is sensitive to the order of the input | ||
| # observations: If multiple observations share the same sale price but have | ||
| # different estimates, the output coefficients will be different depending | ||
| # on which of the sales with identical prices gets ordered first in the | ||
| # input dataframe. To ensure a stable sort order, Quintos recommends | ||
| # sorting by ascending sale price and then by descending estimate to break | ||
| # any ties. This produces "worst case" MKI/KI statistics, but ensures those | ||
| # statistics are deterministic. See this issue for more discussion: | ||
| # https://github.com/ccao-data/assesspy/issues/33#issuecomment-3180632954 | ||
| df.sort_values( | ||
| by=["sale_price", "estimate"], | ||
| ascending=[True, False], | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Uses False for ascending order for estimate in accordance with our external guidance.
|
||
| kind="mergesort", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Question, optional] I wonder if this kwarg is still necessary? Per the pandas docs,
I'm agnostic as to whether we should leave the kwarg in or take it out -- it doesn't seem to make anything worse to leave it in, and it could provide a layer of defensiveness preventing us from accidentally reintroducing an unstable sort if we ever decide to switch back to sorting on a single column -- but I'd be interested to see if the tests still pass when we take it out.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assumed that it wouldn't affect anything. The only reason I left it in was if we ever wanted to do something with the dataset externally, it would remain the same. Maybe we wanted to look at class once sorted by MKI. That's not really a good example, but I could imagine something along these lines. I expect it to pass even without it.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine leaving it in! |
||
| inplace=True, | ||
| ) | ||
| df.reset_index(drop=True, inplace=True) | ||
| a_sorted, sp_sorted = df["estimate"], df["sale_price"] | ||
| n: int = a_sorted.size | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -57,6 +57,26 @@ def test_metric_value_is_correct_iaao( | |
| pt.approx(result, rel=0.02) == expected[iaao_data_name][metric] | ||
| ) | ||
|
|
||
| @pt.mark.parametrize("metric", ["mki", "ki"]) | ||
| def test_mki_tiebreaks_consistent( | ||
| self, metric, quintos_data_with_tiebreaks | ||
| ): | ||
| sale_price, estimate, estimate_alt_sort_1, estimate_alt_sort_2 = ( | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can index, but this feels easier to interpret later. |
||
| quintos_data_with_tiebreaks | ||
| ) | ||
| fn = getattr(ap, metric) | ||
|
|
||
| ref_val = fn(estimate, sale_price) | ||
|
|
||
| for idx, est in enumerate( | ||
| (estimate_alt_sort_1, estimate_alt_sort_2), start=1 | ||
| ): | ||
| val = fn(est, sale_price) | ||
| assert val == ref_val, ( | ||
| f"{metric.upper()} differs between estimate[0] and estimate_alt_sort_{idx}: " | ||
| f"{ref_val} vs {val}" | ||
| ) | ||
|
|
||
| def test_metric_has_numeric_output(self, metric_val): | ||
| assert type(metric_val) is float | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| ================================ | ||
| Sample data from Quintos studies, modified to test sort order tiebreaks | ||
| ================================ | ||
|
|
||
| .. autofunction:: assesspy.quintos_sample_with_tiebreaks |
Uh oh!
There was an error while loading. Please reload this page.