feat: add penalize_ambiguous_claims to AnswerRelevancyMetric by Krishnachaitanyakc · Pull Request #2573 · confident-ai/deepeval

Krishnachaitanyakc · 2026-03-25T02:23:51Z

Summary

Adds penalize_ambiguous_claims parameter to AnswerRelevancyMetric, matching the existing implementation in FaithfulnessMetric
When set to True, statements with an idk verdict (ambiguous/supporting information) are no longer counted as relevant, giving users stricter scoring control
Defaults to False to preserve backward compatibility

Context

The FaithfulnessMetric already supports penalize_ambiguous_claims (introduced here). As noted in #2300, the AnswerRelevancyMetric was missing this feature even though its verdict schema already supports idk verdicts and the LLM evaluation prompt already instructs the model to produce them.

Changes

deepeval/metrics/answer_relevancy/answer_relevancy.py:
- Added penalize_ambiguous_claims: bool = False constructor parameter
- Updated _calculate_score() to decrement relevant_count for idk verdicts when the flag is enabled (same logic as FaithfulnessMetric._calculate_score())

Usage

from deepeval.metrics import AnswerRelevancyMetric

# Default behavior (unchanged) - idk verdicts count as relevant
metric = AnswerRelevancyMetric()

# Stricter scoring - idk verdicts penalized
metric = AnswerRelevancyMetric(penalize_ambiguous_claims=True)

Test plan

Verify AnswerRelevancyMetric() without the flag behaves identically to before (backward compatible)
Verify AnswerRelevancyMetric(penalize_ambiguous_claims=True) correctly penalizes idk verdicts in scoring
Confirm the scoring logic matches FaithfulnessMetric._calculate_score() behavior

Add support for penalizing ambiguous ('idk') verdicts in the Answer Relevancy metric, matching the existing behavior in the Faithfulness metric. When enabled, statements with an 'idk' verdict are no longer counted as relevant, giving users stricter control over scoring. Closes confident-ai#2300

vercel · 2026-03-25T02:23:57Z

@Krishnachaitanyakc is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

- Add missing blank line in test_span_interceptor.py to satisfy black - Wrap metric instantiation in test app files with try/except helpers to avoid requiring API keys at module import time - Fix incorrect ignore path for test_dataset_iterator.py in test_core workflow (was test_tracing/test_dataset_iterator.py, actual location is test_tracing/test_integration/test_dataset_iterator.py) - Add secrets guard to test_confident workflow so forked PRs skip gracefully instead of failing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add penalize_ambiguous_claims to AnswerRelevancyMetric#2573

feat: add penalize_ambiguous_claims to AnswerRelevancyMetric#2573
Krishnachaitanyakc wants to merge 2 commits intoconfident-ai:mainfrom
Krishnachaitanyakc:feat/penalize-ambiguous-claims-answer-relevancy

Krishnachaitanyakc commented Mar 25, 2026

Uh oh!

vercel bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Krishnachaitanyakc commented Mar 25, 2026

Summary

Context

Changes

Usage

Test plan

Uh oh!

vercel bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant