You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of the challenge score metric uses Monte Carlo tie-breaking with 10**4 permutations to approximate the expected confusion matrix. This has several drawbacks:
Computationally expensive (runtime grows as O(num_permutations n log n))
Non-reproducible due to randomness
Results can vary between runs, which complicates benchmarking and CI.
When using bootstrapping to estimate performance distributions (as we did in our paper), the repeated Monte Carlo sampling makes the metric prohibitively slow and effectively unusable.
Proposal:
Replace the sampling with an exact computation using the hypergeometric expectation. This removes stochasticity, guarantees reproducibility, and reduces runtime to O(n log n).