Add BenchmarkEvaluator with basic precision/recall computation#1870
Add BenchmarkEvaluator with basic precision/recall computation#1870Muhammedswalihu wants to merge 4 commits intoroboflow:developfrom
Conversation
|
Muhammed Swalihu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Hi @SkalskiP @onuralpszr — I've submitted this PR for the BenchmarkEvaluator (Issue #1778 ). Let me know if you'd like me to fix the |
|
Hi @Muhammedswalihu, this seems like a really valuable feature! |
|
Hi @soumik12345 , thanks for the review! I’ll go ahead and: Replace the placeholder logic in BenchmarkEvaluator with full precision/recall/mAP computation, Add a working demo example (maybe in a Colab notebook for clarity), and Improve the test coverage with more edge cases and per-class evaluation. Let me know if there’s anything specific you’d like to see included. Appreciate the opportunity — excited to take this further! |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Hi @soumik12345, I've added a Colab-style demo notebook BenchmarkEvaluator_Demo.ipynb! It includes:
This should help users understand and adopt the module more easily. Let me know if you'd like me to polish or extend this notebook further! |
soumik12345
left a comment
There was a problem hiding this comment.
Hi @Muhammedswalihu, thanks for providing the PoC!
Please feel free to proceed with the actual implementation.
Also, there's no need to commit the notebook to supervision, you can just attach a colab notebook in a comment when the PR is ready for review with the complete logic.
| # TODO: Add class alignment, matching using IoU | ||
| tp = len(self.predictions.xyxy) # Placeholder | ||
| fp = 0 | ||
| fn = len(self.ground_truth.xyxy) - tp |
There was a problem hiding this comment.
The logic here is incomplete, please add the correct logic to compute precision and recall.
| from supervision.metrics.benchmark import BenchmarkEvaluator | ||
|
|
||
|
|
||
| def test_basic_precision_recall(): |
There was a problem hiding this comment.
This too seems like a placeholder test; please proceed with the implementation and add comprehensive unit tests.
|
Great initiative on the BenchmarkEvaluator! This addresses a crucial need for standardized evaluation metrics. I'd like to offer some technical guidance to help you complete the implementation effectively. Key Implementation Recommendations:
Performance Considerations:
This evaluator will be invaluable for the community's benchmarking needs. Happy to provide more specific implementation details if needed! Best regards, |
Summary
This PR introduces a utility class
BenchmarkEvaluatorinsupervision/metrics/benchmark.pyto support benchmarking object detection results across different datasets or models.Features
Detectionsobjects for ground truth and predictiontests/metrics/test_benchmark.pyMotivation
Addresses Issue #1778: Improving object detection benchmarking process for unrelated datasets.
Let me know if you'd like me to extend this in future PRs with:
Thanks for the opportunity to contribute!