Skip to content

feat: add penalize_ambiguous_claims to AnswerRelevancyMetric#2573

Open
Krishnachaitanyakc wants to merge 2 commits intoconfident-ai:mainfrom
Krishnachaitanyakc:feat/penalize-ambiguous-claims-answer-relevancy
Open

feat: add penalize_ambiguous_claims to AnswerRelevancyMetric#2573
Krishnachaitanyakc wants to merge 2 commits intoconfident-ai:mainfrom
Krishnachaitanyakc:feat/penalize-ambiguous-claims-answer-relevancy

Conversation

@Krishnachaitanyakc
Copy link
Copy Markdown

Summary

  • Adds penalize_ambiguous_claims parameter to AnswerRelevancyMetric, matching the existing implementation in FaithfulnessMetric
  • When set to True, statements with an idk verdict (ambiguous/supporting information) are no longer counted as relevant, giving users stricter scoring control
  • Defaults to False to preserve backward compatibility

Closes #2300

Context

The FaithfulnessMetric already supports penalize_ambiguous_claims (introduced here). As noted in #2300, the AnswerRelevancyMetric was missing this feature even though its verdict schema already supports idk verdicts and the LLM evaluation prompt already instructs the model to produce them.

Changes

  • deepeval/metrics/answer_relevancy/answer_relevancy.py:
    • Added penalize_ambiguous_claims: bool = False constructor parameter
    • Updated _calculate_score() to decrement relevant_count for idk verdicts when the flag is enabled (same logic as FaithfulnessMetric._calculate_score())

Usage

from deepeval.metrics import AnswerRelevancyMetric

# Default behavior (unchanged) - idk verdicts count as relevant
metric = AnswerRelevancyMetric()

# Stricter scoring - idk verdicts penalized
metric = AnswerRelevancyMetric(penalize_ambiguous_claims=True)

Test plan

  • Verify AnswerRelevancyMetric() without the flag behaves identically to before (backward compatible)
  • Verify AnswerRelevancyMetric(penalize_ambiguous_claims=True) correctly penalizes idk verdicts in scoring
  • Confirm the scoring logic matches FaithfulnessMetric._calculate_score() behavior

Add support for penalizing ambiguous ('idk') verdicts in the Answer
Relevancy metric, matching the existing behavior in the Faithfulness
metric. When enabled, statements with an 'idk' verdict are no longer
counted as relevant, giving users stricter control over scoring.

Closes confident-ai#2300
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 25, 2026

@Krishnachaitanyakc is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

- Add missing blank line in test_span_interceptor.py to satisfy black
- Wrap metric instantiation in test app files with try/except helpers to
  avoid requiring API keys at module import time
- Fix incorrect ignore path for test_dataset_iterator.py in test_core workflow
  (was test_tracing/test_dataset_iterator.py, actual location is
  test_tracing/test_integration/test_dataset_iterator.py)
- Add secrets guard to test_confident workflow so forked PRs skip
  gracefully instead of failing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feat request: penalize_ambiguous_claims for answer relevancy

1 participant