feat: add penalize_ambiguous_claims to AnswerRelevancyMetric#2573
Open
Krishnachaitanyakc wants to merge 2 commits intoconfident-ai:mainfrom
Open
Conversation
Add support for penalizing ambiguous ('idk') verdicts in the Answer
Relevancy metric, matching the existing behavior in the Faithfulness
metric. When enabled, statements with an 'idk' verdict are no longer
counted as relevant, giving users stricter control over scoring.
Closes confident-ai#2300
|
@Krishnachaitanyakc is attempting to deploy a commit to the Confident AI Team on Vercel. A member of the Team first needs to authorize it. |
- Add missing blank line in test_span_interceptor.py to satisfy black - Wrap metric instantiation in test app files with try/except helpers to avoid requiring API keys at module import time - Fix incorrect ignore path for test_dataset_iterator.py in test_core workflow (was test_tracing/test_dataset_iterator.py, actual location is test_tracing/test_integration/test_dataset_iterator.py) - Add secrets guard to test_confident workflow so forked PRs skip gracefully instead of failing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
penalize_ambiguous_claimsparameter toAnswerRelevancyMetric, matching the existing implementation inFaithfulnessMetricTrue, statements with anidkverdict (ambiguous/supporting information) are no longer counted as relevant, giving users stricter scoring controlFalseto preserve backward compatibilityCloses #2300
Context
The
FaithfulnessMetricalready supportspenalize_ambiguous_claims(introduced here). As noted in #2300, theAnswerRelevancyMetricwas missing this feature even though its verdict schema already supportsidkverdicts and the LLM evaluation prompt already instructs the model to produce them.Changes
deepeval/metrics/answer_relevancy/answer_relevancy.py:penalize_ambiguous_claims: bool = Falseconstructor parameter_calculate_score()to decrementrelevant_countforidkverdicts when the flag is enabled (same logic asFaithfulnessMetric._calculate_score())Usage
Test plan
AnswerRelevancyMetric()without the flag behaves identically to before (backward compatible)AnswerRelevancyMetric(penalize_ambiguous_claims=True)correctly penalizesidkverdicts in scoringFaithfulnessMetric._calculate_score()behavior