DataDog · lievan · Jul 17, 2025 · Jul 21, 2025
@@ -1234,6 +1234,10 @@ The `LLMObs.submit_evaluation_for()` method accepts the following arguments:
 `tags`
 : optional - _dictionary_
 <br />A dictionary of string key-value pairs that users can add as tags regarding the evaluation. For more information about tags, see [Getting Started with Tags][2].
+
+`metadata`
+: optional - _dictionary_
+<br />A JSON serializable dictionary of key-value metadata pairs relevant to the evaluation metric.
 {{% /collapse-content %}}
 
 #### Example
@@ -1262,6 +1266,7 @@ def llm_call():
         metric_type="score",
         value=10,
         tags={"evaluation_provider": "ragas"},
+        metadata={"flagged_segments": ["harmful part of output", "some other harmful part of output"]}
     )
 
     # joining an evaluation to a span via span ID and trace ID
@@ -1273,6 +1278,7 @@ def llm_call():
         metric_type="score",
         value=10,
         tags={"evaluation_provider": "ragas"},
+        metadata={"flagged_segments": ["harmful part of output", "some other harmful part of output"]}
     )
     return completion
 {{< /code-block >}}
@@ -1314,6 +1320,10 @@ The `evaluationOptions` object can contain the following:
 `tags`
 : optional - _dictionary_
 <br />A dictionary of string key-value pairs that users can add as tags regarding the evaluation. For more information about tags, see [Getting Started with Tags][1].
+
+`metadata`
+: optional - _dictionary_
+<br />A JSON serializable dictionary of key-value metadata pairs relevant to the evaluation metric.
 {{% /collapse-content %}}
 
 #### Example
@@ -1326,7 +1336,8 @@ function llmCall () {
     label: "harmfulness",
     metricType: "score",
     value: 10,
-    tags: { evaluationProvider: "ragas" }
+    tags: { evaluationProvider: "ragas" },
+    metadata: { flaggedSegments: ["harmful part of output", "some other harmful part of output"] }
   })
   return completion
 }

@@ -348,7 +348,10 @@ Evaluations must be joined to a unique span. You can identify the target span us
           "timestamp_ms": 1609479200,
           "metric_type": "score",
           "label": "Accuracy",
-          "score_value": 3
+          "score_value": 3,
+          "metadata": {
+            "flagged_segments": ["harmful part of output", "some other harmful part of output"]
+          }
         }
       ]
     }
@@ -380,6 +383,7 @@ Evaluations must be joined to a unique span. You can identify the target span us
 | categorical_value [*required if the metric_type is "categorical"*]    | string | A string representing the category that the evaluation belongs to. |
 | score_value [*required if the metric_type is "score"*]    | number | A score value of the evaluation. |
 | tags        | [[Tag](#tag)] | A list of tags to apply to this particular evaluation metric.       |
+| metadata    | Dict | Additional data relevant to the evaluation. |
 
 #### JoinOn