Add properties to FDP, fix changelog formatting (#42208)

nagkumar91 · Nagkumar Arkalgud · Nagkumar Arkalgud · web-flow · commit 20284890b073 · 2025-07-25T11:42:29.000-07:00
* Prepare evals SDK Release

* Fix bug

* Fix for ADV_CONV for FDP projects

* Update release date

* re-add pyrit to matrix

* Change grader ids

* Update unit test

* replace all old grader IDs in tests

* Update platform-matrix.json

Add pyrit and not remove the other one

* Update test to ensure everything is mocked

* tox/black fixes

* Skip that test with issues

* update grader ID according to API View feedback

* Update test

* remove string check for grader ID

* Update changelog and officialy start freeze

* update the enum according to suggestions

* update the changelog

* Finalize logic

* Initial plan

* Fix client request ID headers in azure-ai-evaluation

Co-authored-by: nagkumar91 &lt;4727422+nagkumar91@users.noreply.github.com&gt;

* Fix client request ID header format in rai_service.py

Co-authored-by: nagkumar91 &lt;4727422+nagkumar91@users.noreply.github.com&gt;

* Passing threshold in AzureOpenAIScoreModelGrader

* Add changelog

* Adding the self.pass_threshold instead of pass_threshold

* Add the python grader

* Remove redundant test

* Add class to exception list and format code

* Add properties to evaluation upload run for FDP

* Remove debug

* Remove the redundant property

* Fix changelog

* Fix the multiple features added section

* removed the properties in update

---------

Co-authored-by: Nagkumar Arkalgud &lt;nagkumar@naarkalg-work-mac.local&gt;
Co-authored-by: Nagkumar Arkalgud &lt;nagkumar@Mac.lan&gt;
Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: nagkumar91 &lt;4727422+nagkumar91@users.noreply.github.com&gt;
diff --git a/sdk/evaluation/azure-ai-evaluation/CHANGELOG.md b/sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
@@ -9,12 +9,11 @@
 ### Features Added
 
 - Added support for Azure OpenAI Python grader via `AzureOpenAIPythonGrader` class, which serves as a wrapper around Azure Open AI Python grader configurations. This new grader object can be supplied to the main `evaluate` method as if it were a normal callable evaluator.
-
-### Features Added
 - Added `attack_success_thresholds` parameter to `RedTeam` class for configuring custom thresholds that determine attack success. This allows users to set specific threshold values for each risk category, with scores greater than the threshold considered successful attacks (i.e. higher threshold means higher 
 tolerance for harmful responses).
 - Enhanced threshold reporting in RedTeam results to include default threshold values when custom thresholds aren't specified, providing better transparency about the evaluation criteria used.
 
+
 ### Bugs Fixed
 
 - Fixed red team scan `output_path` issue where individual evaluation results were overwriting each other instead of being preserved as separate files. Individual evaluations now create unique files while the user's `output_path` is reserved for final aggregated results.
diff --git a/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_utils.py b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_utils.py
@@ -178,7 +178,6 @@ def _log_metrics_and_instance_results_onedp(
 
         properties = {
             EvaluationRunProperties.RUN_TYPE: "eval_run",
-            EvaluationRunProperties.EVALUATION_RUN: "promptflow.BatchRun",
             EvaluationRunProperties.EVALUATION_SDK: f"azure-ai-evaluation:{VERSION}",
             "_azureml.evaluate_artifacts": json.dumps([{"path": artifact_name, "type": "table"}]),
         }
@@ -191,6 +190,7 @@ def _log_metrics_and_instance_results_onedp(
         upload_run_response = client.start_evaluation_run(
             evaluation=EvaluationUpload(
                 display_name=evaluation_name,
+                properties=properties,
             )
         )
 
@@ -202,7 +202,6 @@ def _log_metrics_and_instance_results_onedp(
                 outputs={
                     "evaluationResultId": create_evaluation_result_response.id,
                 },
-                properties=properties,
             ),
         )
 

Original file line number	Diff line number	Diff line change
`@@ -178,7 +178,6 @@ def _log_metrics_and_instance_results_onedp(`
`178`	`178`
`179`	`179`	`properties = {`
`180`	`180`	`EvaluationRunProperties.RUN_TYPE: "eval_run",`
`181`		`- EvaluationRunProperties.EVALUATION_RUN: "promptflow.BatchRun",`
`182`	`181`	`EvaluationRunProperties.EVALUATION_SDK: f"azure-ai-evaluation:{VERSION}",`
`183`	`182`	`"_azureml.evaluate_artifacts": json.dumps([{"path": artifact_name, "type": "table"}]),`
`184`	`183`	`}`
`@@ -191,6 +190,7 @@ def _log_metrics_and_instance_results_onedp(`
`191`	`190`	`upload_run_response = client.start_evaluation_run(`
`192`	`191`	`evaluation=EvaluationUpload(`
`193`	`192`	`display_name=evaluation_name,`
	`193`	`+ properties=properties,`
`194`	`194`	`)`
`195`	`195`	`)`
`196`	`196`
`@@ -202,7 +202,6 @@ def _log_metrics_and_instance_results_onedp(`
`202`	`202`	`outputs={`
`203`	`203`	`"evaluationResultId": create_evaluation_result_response.id,`
`204`	`204`	`},`
`205`		`- properties=properties,`
`206`	`205`	`),`
`207`	`206`	`)`
`208`	`207`