From b9838430fadb409f42a3f179df3a31999700418d Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Mon, 29 Sep 2025 23:55:30 -0700 Subject: [PATCH 01/15] adding readme --- .../samples/evaluator_catalog/README.md | 113 ++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md new file mode 100644 index 000000000000..89ca7f6a7187 --- /dev/null +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -0,0 +1,113 @@ + +# How to publish new evaluator in Evaluator Catalog. + +This guild helps our partners to bring their evaluators into Microsoft provided Evaluator Catalog in Next Gen UI. + +## Context + +We are building an Evaluator Catalog, that will allow us to store Microsoft provided built-in evaluators, as well as Customer's provided custom evaluators. It will allow versioning support so that customer can maintain different version of custom evaluators. + +Using this catalog, customer can publish their custom evaluators under the project. Post Ignite, we'll allow them to prompt evaluators from projects to registries so that can share evaluators amount different projects. + +This evaluator catalog is backed by Generic Asset Service (that provides scalable and multi-region support to store all your assets in CosmosDB). + +Types of Built_in Evaluators +There are 3 types of evaluators we support as Built-In Evaluators. + +1. Code Based - It contains Python file +2. Code + Prompt Based - It contains Python file & Prompty file +3. Prompt Based - It contains only Prompty file. +4. Service Based - It references the evaluator from Evaluation SDK or RAI Service. + +## Step 1: + +Create builtin evaluator and use azure-ai-evaluation SDK to run locally. +List of evaluators can be found at [here](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators) + +# Step 2: Create a PR +Add a new folder with name as the Evaluator name. + +Please include following files. + +* asset.yaml +* spec.yaml +* 'evaluator' folder. Please include python files and prompty files in this folder. + +Please look at existing built-in evaluators for reference. +Location : [/assets/evaluators/builtin](https://msdata.visualstudio.com/Vienna/_git/azureml-asset?path=/assets/evaluators/builtin) + +Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azureml-asset/pullrequest/1816050?_a=files\) + +Please follow directions given below. + +## spec.yaml content + +```yml + +type: "evaluator" +name: "test.{name}" +version: 1 +displayName: "{display name}" +description: "{description}" +evaluatorType: "builtin" +evaluatorSubType: "code" +It represents what type of evaluator It is. +For #1 & #2 type evaluators, please add "code" +For #3 type evaluator, please provide "prompt" +For #4 type evaluator, please provide "service" + +**categories: ** +It represents an array of categories (Quality, Safety, Agents) +Example- ["Quality", "Safety"] + +**initParameterSchema:** +The JSON schema (Draft 2020-12) for the evaluator's input parameters. This includes parameters like type, properties, required. +Example- + type: "object" + properties: + threshold: + type: "number" + minimumValue: 0 + maximumValue: 1 + step: 0.1 + required: ["threshold"] + + +**dataMappingSchema:** +The JSON schema (Draft 2020-12) for the evaluator's input data. This includes parameters like type, properties, required. +Example- + type: "object" + properties: + ground_truth: + type: "string" + response: + type: "string" + required: ["ground_truth", "response"] + +**outputSchema:** +List of output metrics produced by this evaluator +Example- + bleu: + type: "continuous" + desirable_direction: "increase" + min_value: 0 + max_value: 1 + +path: ./evaluator +``` + +# Step 3: +When PR is merged. Evaluation Team will be able to kick off the CI Pipeline to publish evaluator in the Evaluator Catalog. +This is done is 2 steps. + +In Step 1, new evaluator is published in azureml-dev registry so that I can be tested in INT environment. Once all looks good, Step 2 is performed. +In Step 2, new evaluator is published in azure-ml registry (for Production). + + +# Step 4: +Now, use Evaluators CRUD APIs to view evaluator in GET /evaluator list. + +Use following links + +INT: +PROD: \ No newline at end of file From 9a4b6ef86144aedba7427c21f4d8be0e3f519aaa Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 00:20:25 -0700 Subject: [PATCH 02/15] adding readme --- .../samples/evaluator_catalog/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 89ca7f6a7187..07b47d1e939d 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -19,12 +19,12 @@ There are 3 types of evaluators we support as Built-In Evaluators. 3. Prompt Based - It contains only Prompty file. 4. Service Based - It references the evaluator from Evaluation SDK or RAI Service. -## Step 1: +## Step 1: Run Evaluator with SDK. Create builtin evaluator and use azure-ai-evaluation SDK to run locally. List of evaluators can be found at [here](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators) -# Step 2: Create a PR +## Step 2: Create a PR Add a new folder with name as the Evaluator name. Please include following files. @@ -96,7 +96,7 @@ Example- path: ./evaluator ``` -# Step 3: +## Step 3: Publish When PR is merged. Evaluation Team will be able to kick off the CI Pipeline to publish evaluator in the Evaluator Catalog. This is done is 2 steps. @@ -104,7 +104,7 @@ In Step 1, new evaluator is published in azureml-dev registry so that I can be t In Step 2, new evaluator is published in azure-ml registry (for Production). -# Step 4: +## Step 4: Verify Evaluator Now, use Evaluators CRUD APIs to view evaluator in GET /evaluator list. Use following links From 8ff4d7a9672c7a14b2ca33dd7666f33432aeb22a Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 00:20:45 -0700 Subject: [PATCH 03/15] Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 07b47d1e939d..1af6fce0b928 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -1,7 +1,7 @@ # How to publish new evaluator in Evaluator Catalog. -This guild helps our partners to bring their evaluators into Microsoft provided Evaluator Catalog in Next Gen UI. +This guide helps our partners to bring their evaluators into Microsoft provided Evaluator Catalog in Next Gen UI. ## Context From dda00646c28b1b6ea6838366ccd73acba58277a5 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 00:20:56 -0700 Subject: [PATCH 04/15] Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 1af6fce0b928..2f167322c0e1 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -7,7 +7,7 @@ This guide helps our partners to bring their evaluators into Microsoft provided We are building an Evaluator Catalog, that will allow us to store Microsoft provided built-in evaluators, as well as Customer's provided custom evaluators. It will allow versioning support so that customer can maintain different version of custom evaluators. -Using this catalog, customer can publish their custom evaluators under the project. Post Ignite, we'll allow them to prompt evaluators from projects to registries so that can share evaluators amount different projects. +Using this catalog, customer can publish their custom evaluators under the project. Post Ignite, we'll allow them to prompt evaluators from projects to registries so that can share evaluators among different projects. This evaluator catalog is backed by Generic Asset Service (that provides scalable and multi-region support to store all your assets in CosmosDB). From 6101c9c99316f95313dcad12b91ef730dbbe731b Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 00:21:07 -0700 Subject: [PATCH 05/15] Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 2f167322c0e1..8d62e31412b8 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -100,7 +100,7 @@ path: ./evaluator When PR is merged. Evaluation Team will be able to kick off the CI Pipeline to publish evaluator in the Evaluator Catalog. This is done is 2 steps. -In Step 1, new evaluator is published in azureml-dev registry so that I can be tested in INT environment. Once all looks good, Step 2 is performed. +In Step 1, new evaluator is published in azureml-dev registry so that it can be tested in INT environment. Once all looks good, Step 2 is performed. In Step 2, new evaluator is published in azure-ml registry (for Production). From 97377a975ec2d34ae0959be4f1aa3eb345882f56 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 19:06:00 -0700 Subject: [PATCH 06/15] adding readme --- .../samples/evaluator_catalog/README.md | 92 +++++++++---------- 1 file changed, 46 insertions(+), 46 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 07b47d1e939d..5eb6534f89f9 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -40,59 +40,59 @@ Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azu Please follow directions given below. -## spec.yaml content +## Asset Content - spec.yaml + + +| Asset Property | API Property | Example | Description | +| - | - | - | - | +| type | type | evaluator | It is always 'evaluator'. It identifies type of the asset. | +| name | name | test.f1_score| Name of the evaluator, alway in URL | +| version | version | 1 | It is auto incremented version number, starts with 1 | +| displayName: | display name | F1 Score | It is the name of the evaluator shown in UI | +| description: | description | | This is description of the evaluator. | +| evaluatorType: | evaluator_type | "builtin"| For Built-in evaluators, value is "builtin". For custom evaluators, value is "custom". API only supports 'custom'| +| evaluatorSubType | definition.type | "code" | It represents what type of evaluator It is. For #1 & #2 type evaluators, please add "code". For #3 type evaluator, please provide "prompt". For #4 type evaluator, please provide "service" | +| categories | categories | ["Quality"] | The categories of the evaluator. It's an array. Allowed values are Quality, Safety, Agents. Multiple values are allowed | +| initParameterSchema | | | The JSON schema (Draft 2020-12) for the evaluator's input parameters. This includes parameters like type, properties, required. | +| dataMappingSchema | | | The JSON schema (Draft 2020-12) for the evaluator's input data. This includes parameters like type, properties, required. | +| outputSchema | | | List of output metrics produced by this evaluator | +| path | Not expose in API | ./evaluator | Fixed. | + +Example: ```yml type: "evaluator" -name: "test.{name}" +name: "test.bleu_score" version: 1 -displayName: "{display name}" -description: "{description}" +displayName: "Bleu-Score-Evaluator" +description: "| | |\n| -- | -- |\n| Score range | Float [0-1]: higher means better quality. |\n| What is this metric? | BLEU (Bilingual Evaluation Understudy) score is commonly used in natural language processing (NLP) and machine translation. It measures how closely the generated text matches the reference text. |\n| How does it work? | The BLEU score calculates the geometric mean of the precision of n-grams between the model-generated text and the reference text, with an added brevity penalty for shorter generated text. The precision is computed for unigrams, bigrams, trigrams, etc., depending on the desired BLEU score level. The more n-grams that are shared between the generated and reference texts, the higher the BLEU score. |\n| When to use it? | The recommended scenario is Natural Language Processing (NLP) tasks. It's widely used in text summarization and text generation use cases. |\n| What does it need as input? | Response, Ground Truth |\n" evaluatorType: "builtin" evaluatorSubType: "code" -It represents what type of evaluator It is. -For #1 & #2 type evaluators, please add "code" -For #3 type evaluator, please provide "prompt" -For #4 type evaluator, please provide "service" - -**categories: ** -It represents an array of categories (Quality, Safety, Agents) -Example- ["Quality", "Safety"] - -**initParameterSchema:** -The JSON schema (Draft 2020-12) for the evaluator's input parameters. This includes parameters like type, properties, required. -Example- - type: "object" - properties: - threshold: - type: "number" - minimumValue: 0 - maximumValue: 1 - step: 0.1 - required: ["threshold"] - - -**dataMappingSchema:** -The JSON schema (Draft 2020-12) for the evaluator's input data. This includes parameters like type, properties, required. -Example- - type: "object" - properties: - ground_truth: - type: "string" - response: - type: "string" - required: ["ground_truth", "response"] - -**outputSchema:** -List of output metrics produced by this evaluator -Example- - bleu: - type: "continuous" - desirable_direction: "increase" - min_value: 0 - max_value: 1 - +categories: ["quality"] +initParameterSchema: + type: "object" + properties: + threshold: + type: "number" + minimumValue: 0 + maximumValue: 1 + step: 0.1 + required: ["threshold"] +dataMappingSchema: + type: "object" + properties: + ground_truth: + type: "string" + response: + type: "string" + required: ["ground_truth", "response"] +outputSchema: + bleu: + type: "continuous" + desirable_direction: "increase" + min_value: 0 + max_value: 1 path: ./evaluator ``` From 5451b9d4521a2dcd2ca3a3fac00a436a35fabb69 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 19:06:40 -0700 Subject: [PATCH 07/15] fixed --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 5eb6534f89f9..11691ae5d23b 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -75,9 +75,9 @@ initParameterSchema: properties: threshold: type: "number" - minimumValue: 0 - maximumValue: 1 - step: 0.1 + minimum: 0 + maximum: 1 + multipleOf: 0.1 required: ["threshold"] dataMappingSchema: type: "object" From 7b985581306037cb74f45d964fcc88896f4cd110 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 19:40:15 -0700 Subject: [PATCH 08/15] fixed --- .../samples/evaluator_catalog/README.md | 52 +++++++++++++------ 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 0b5357a3dd00..c8ee03e40643 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -19,26 +19,38 @@ There are 3 types of evaluators we support as Built-In Evaluators. 3. Prompt Based - It contains only Prompty file. 4. Service Based - It references the evaluator from Evaluation SDK or RAI Service. -## Step 1: Run Evaluator with SDK. +## Step 1: Run Your evaluators with Evaluation SDK. Create builtin evaluator and use azure-ai-evaluation SDK to run locally. List of evaluators can be found at [here](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators) -## Step 2: Create a PR -Add a new folder with name as the Evaluator name. +## Step 2: Create a PR to provide -Please include following files. +We are storing all the builtin evaluators in Azureml-asset Repo. Please provide your evaluators files by creating a PR in this repo. Please follow the steps. + +1. Add a new folder with name as the Evaluator name. + +2. Please include following files. * asset.yaml * spec.yaml -* 'evaluator' folder. Please include python files and prompty files in this folder. +* 'evaluator' folder. + +This 'evaluator' folder contains two files. +1. Python file name should be same as evaluator name with '_' prefix. +2. Prompty file name should be same as evaluator name with .prompty extension. + +Example: Coherence evaluator contains 2 files. +_coherence.py +coherence.prompty Please look at existing built-in evaluators for reference. Location : [/assets/evaluators/builtin](https://msdata.visualstudio.com/Vienna/_git/azureml-asset?path=/assets/evaluators/builtin) - Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azureml-asset/pullrequest/1816050?_a=files\) -Please follow directions given below. +3. Please copy asset.yaml from sample. No change is required. + +4. Please follow directions given below to create spec.yaml. ## Asset Content - spec.yaml @@ -96,18 +108,24 @@ outputSchema: path: ./evaluator ``` -## Step 3: Publish -When PR is merged. Evaluation Team will be able to kick off the CI Pipeline to publish evaluator in the Evaluator Catalog. -This is done is 2 steps. +## Step 3: Test in RAI Service ACA Code. + +Once PR is merged, Evaluation Team will use your evaluator files to run them in ACA to make sure no errors. You also need to provide jsonl dataset files for testing. + +## Step 4: Publish on Dev Registry (Azureml-dev) +When PR is review and merged. Evaluation Team will be able to kick off the CI Pipeline to publish evaluator in the Evaluator Catalog in azureml-dev registry. -In Step 1, new evaluator is published in azureml-dev registry so that it can be tested in INT environment. Once all looks good, Step 2 is performed. -In Step 2, new evaluator is published in azure-ml registry (for Production). +## Step 5: Test is INT Environment +Team will verify following items: +1. Verify if new evaluator is available in Evaluator REST APIs. +2. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA. -## Step 4: Verify Evaluator -Now, use Evaluators CRUD APIs to view evaluator in GET /evaluator list. +## Step 6: Publish on Prod Registry (Azureml) +Evaluation Team will be able to kick off the CI Pipeline again to publish evaluator in the Evaluator Catalog in azureml registry. -Use following links +## Step 7: Test is Prod Environment +Team will verify following items: -INT: -PROD: \ No newline at end of file +1. Verify if new evaluator is available in Evaluator REST APIs. +2. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA. \ No newline at end of file From 3754b2f27ccefac1488e959bc913ec49c79100ff Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 19:44:10 -0700 Subject: [PATCH 09/15] fixed --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index c8ee03e40643..3284103be910 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -65,9 +65,9 @@ Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azu | evaluatorType: | evaluator_type | "builtin"| For Built-in evaluators, value is "builtin". For custom evaluators, value is "custom". API only supports 'custom'| | evaluatorSubType | definition.type | "code" | It represents what type of evaluator It is. For #1 & #2 type evaluators, please add "code". For #3 type evaluator, please provide "prompt". For #4 type evaluator, please provide "service" | | categories | categories | ["Quality"] | The categories of the evaluator. It's an array. Allowed values are Quality, Safety, Agents. Multiple values are allowed | -| initParameterSchema | | | The JSON schema (Draft 2020-12) for the evaluator's input parameters. This includes parameters like type, properties, required. | -| dataMappingSchema | | | The JSON schema (Draft 2020-12) for the evaluator's input data. This includes parameters like type, properties, required. | -| outputSchema | | | List of output metrics produced by this evaluator | +| initParameterSchema | init_parameters | | The JSON schema (Draft 2020-12) for the evaluator's input parameters. This includes parameters like type, properties, required. | +| dataMappingSchema | data_schema | | The JSON schema (Draft 2020-12) for the evaluator's input data. This includes parameters like type, properties, required. | +| outputSchema | metrics | | List of output metrics produced by this evaluator | | path | Not expose in API | ./evaluator | Fixed. | Example: From bad07d9d380225c4e165aebe9e10bf24dfb06a65 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 19:44:58 -0700 Subject: [PATCH 10/15] fixed --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 3284103be910..d0b5c0266cf3 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -1,5 +1,5 @@ -# How to publish new evaluator in Evaluator Catalog. +# How to publish new built-in evaluator in Evaluator Catalog. This guide helps our partners to bring their evaluators into Microsoft provided Evaluator Catalog in Next Gen UI. From 7c39d050e17deb3dd70d1a0b058a43d615fa01c7 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 19:48:33 -0700 Subject: [PATCH 11/15] fixed --- .../samples/evaluator_catalog/README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index d0b5c0266cf3..51031aac3011 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -110,22 +110,24 @@ path: ./evaluator ## Step 3: Test in RAI Service ACA Code. -Once PR is merged, Evaluation Team will use your evaluator files to run them in ACA to make sure no errors. You also need to provide jsonl dataset files for testing. +Once PR is reviewed and merged, Evaluation Team will use your evaluator files to run them in ACA to make sure no errors found. You also need to provide jsonl dataset files for testing. ## Step 4: Publish on Dev Registry (Azureml-dev) -When PR is review and merged. Evaluation Team will be able to kick off the CI Pipeline to publish evaluator in the Evaluator Catalog in azureml-dev registry. +Evaluation Team will kick off the CI Pipeline to publish evaluator in the Evaluator Catalog in azureml-dev (dev) registry. ## Step 5: Test is INT Environment -Team will verify following items: +Team will verify following: 1. Verify if new evaluator is available in Evaluator REST APIs. -2. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA. +2. Verify if there are rendered correctly in NextGen UI. +3. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA. ## Step 6: Publish on Prod Registry (Azureml) -Evaluation Team will be able to kick off the CI Pipeline again to publish evaluator in the Evaluator Catalog in azureml registry. +Evaluation Team will be able to kick off the CI Pipeline again to publish evaluator in the Evaluator Catalog in azureml (prod) registry. ## Step 7: Test is Prod Environment Team will verify following items: 1. Verify if new evaluator is available in Evaluator REST APIs. -2. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA. \ No newline at end of file +2. Verify if there are rendered correctly in NextGen UI. +3. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA. From dbb51346a6d7c1574ba349d2e1c1aacfd3e58b07 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 20:22:26 -0700 Subject: [PATCH 12/15] fixed --- .../samples/evaluator_catalog/README.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index 51031aac3011..a495d2a57839 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -5,11 +5,13 @@ This guide helps our partners to bring their evaluators into Microsoft provided ## Context -We are building an Evaluator Catalog, that will allow us to store Microsoft provided built-in evaluators, as well as Customer's provided custom evaluators. It will allow versioning support so that customer can maintain different version of custom evaluators. +We are building an Evaluator Catalog, that will allow us to store built-in evaluators (provided by Microsoft), as well as 1P/3P customer's provided evaluators. -Using this catalog, customer can publish their custom evaluators under the project. Post Ignite, we'll allow them to prompt evaluators from projects to registries so that can share evaluators among different projects. +We are also building Evaluators CRUD API and SDK experience which can be used by our external customer to create custom evaluators. NextGen UI will leverage these new APIs to list evaluators in Evaluation Section. -This evaluator catalog is backed by Generic Asset Service (that provides scalable and multi-region support to store all your assets in CosmosDB). +These custom evaluators are also stored in the Evaluator Catalog, but the scope these evaluator will be at project level at Ignite. Post Ignite, we'll allow customers to share their evaluators among different projects. + +This evaluator catalog is backed by Generic Asset Service (that provides versioning support as well as scalable and multi-region support to store all your assets in CosmosDB). Types of Built_in Evaluators There are 3 types of evaluators we support as Built-In Evaluators. @@ -17,14 +19,14 @@ There are 3 types of evaluators we support as Built-In Evaluators. 1. Code Based - It contains Python file 2. Code + Prompt Based - It contains Python file & Prompty file 3. Prompt Based - It contains only Prompty file. -4. Service Based - It references the evaluator from Evaluation SDK or RAI Service. +4. Service Based - It references the evaluator in RAI Service that calls fine tuned models provided by Data Science Team. ## Step 1: Run Your evaluators with Evaluation SDK. Create builtin evaluator and use azure-ai-evaluation SDK to run locally. List of evaluators can be found at [here](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators) -## Step 2: Create a PR to provide +## Step 2: Provide your evaluator We are storing all the builtin evaluators in Azureml-asset Repo. Please provide your evaluators files by creating a PR in this repo. Please follow the steps. @@ -50,11 +52,10 @@ Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azu 3. Please copy asset.yaml from sample. No change is required. -4. Please follow directions given below to create spec.yaml. +4. Please follow steps given below to create spec.yaml. ## Asset Content - spec.yaml - | Asset Property | API Property | Example | Description | | - | - | - | - | | type | type | evaluator | It is always 'evaluator'. It identifies type of the asset. | From c49b7f673a4fc89b60cfd03852b780add4fb8f4c Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 20:33:20 -0700 Subject: [PATCH 13/15] fixed --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index a495d2a57839..d919c91b2d64 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -48,7 +48,7 @@ coherence.prompty Please look at existing built-in evaluators for reference. Location : [/assets/evaluators/builtin](https://msdata.visualstudio.com/Vienna/_git/azureml-asset?path=/assets/evaluators/builtin) -Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azureml-asset/pullrequest/1816050?_a=files\) +Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azureml-asset/pullrequest/1816050) 3. Please copy asset.yaml from sample. No change is required. From a2a4507ae1b0df0f3d8d7202b8fa631222a6bbe3 Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Tue, 30 Sep 2025 20:34:17 -0700 Subject: [PATCH 14/15] fixed --- .../azure-ai-evaluation/samples/evaluator_catalog/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md index d919c91b2d64..fb3b2706b2a9 100644 --- a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -47,8 +47,8 @@ _coherence.py coherence.prompty Please look at existing built-in evaluators for reference. -Location : [/assets/evaluators/builtin](https://msdata.visualstudio.com/Vienna/_git/azureml-asset?path=/assets/evaluators/builtin) -Sample PR: [pullrequest/1816050](https://msdata.visualstudio.com/Vienna/_git/azureml-asset/pullrequest/1816050) +* Evaluator Catalog Repo : [/assets/evaluators/builtin](https://msdata.visualstudio.com/Vienna/_git/azureml-asset?path=/assets/evaluators/builtin) +* Sample PR: [PR 1816050](https://msdata.visualstudio.com/Vienna/_git/azureml-asset/pullrequest/1816050) 3. Please copy asset.yaml from sample. No change is required. From 6a9c774ae811d4088f810bb40676ed49e4f349ab Mon Sep 17 00:00:00 2001 From: Waqas Javed <7674577+w-javed@users.noreply.github.com> Date: Fri, 3 Oct 2025 15:33:31 -0700 Subject: [PATCH 15/15] linkfix --- .../samples/evaluator_catalog/README.md | 134 ++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md diff --git a/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md new file mode 100644 index 000000000000..13911e134c90 --- /dev/null +++ b/sdk/evaluation/azure-ai-evaluation/samples/evaluator_catalog/README.md @@ -0,0 +1,134 @@ + +# How to publish new built-in evaluator in Evaluator Catalog. + +This guide helps our partners to bring their evaluators into Microsoft provided Evaluator Catalog in Next Gen UI. + +## Context + +We are building an Evaluator Catalog, that will allow us to store built-in evaluators (provided by Microsoft), as well as 1P/3P customer's provided evaluators. + +We are also building Evaluators CRUD API and SDK experience which can be used by our external customer to create custom evaluators. NextGen UI will leverage these new APIs to list evaluators in Evaluation Section. + +These custom evaluators are also stored in the Evaluator Catalog, but the scope these evaluator will be at project level at Ignite. Post Ignite, we'll allow customers to share their evaluators among different projects. + +This evaluator catalog is backed by Generic Asset Service (that provides versioning support as well as scalable and multi-region support to store all your assets in CosmosDB). + +Types of Built_in Evaluators +There are 3 types of evaluators we support as Built-In Evaluators. + +1. Code Based - It contains Python file +2. Code + Prompt Based - It contains Python file & Prompty file +3. Prompt Based - It contains only Prompty file. +4. Service Based - It references the evaluator in RAI Service that calls fine tuned models provided by Data Science Team. + +## Step 1: Run Your evaluators with Evaluation SDK. + +Create builtin evaluator and use azure-ai-evaluation SDK to run locally. +List of evaluators can be found at [here](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators) + +## Step 2: Provide your evaluator + +We are storing all the builtin evaluators in Azureml-asset Repo. Please provide your evaluators files by creating a PR in this repo. Please follow the steps. + +1. Add a new folder with name as the Evaluator name. + +2. Please include following files. + +* asset.yaml +* spec.yaml +* 'evaluator' folder. + +This 'evaluator' folder contains two files. +1. Python file name should be same as evaluator name with '_' prefix. +2. Prompty file name should be same as evaluator name with .prompty extension. + +Example: Coherence evaluator contains 2 files. +_coherence.py +coherence.prompty + +Please look at existing built-in evaluators for reference. +* Evaluator Catalog Repo : [/assets/evaluators/builtin](https://github.com/Azure/azureml-assets/tree/main/assets/evaluators/builtin) +* Sample PR: [PR 1816050](https://msdata.visualstudio.com/Vienna/_git/azureml-asset/pullrequest/1816050) + +3. Please copy asset.yaml from sample. No change is required. + +4. Please follow steps given below to create spec.yaml. + +## Asset Content - spec.yaml + +| Asset Property | API Property | Example | Description | +| - | - | - | - | +| type | type | evaluator | It is always 'evaluator'. It identifies type of the asset. | +| name | name | test.f1_score| Name of the evaluator, alway in URL | +| version | version | 1 | It is auto incremented version number, starts with 1 | +| displayName: | display name | F1 Score | It is the name of the evaluator shown in UI | +| description: | description | | This is description of the evaluator. | +| evaluatorType: | evaluator_type | "builtin"| For Built-in evaluators, value is "builtin". For custom evaluators, value is "custom". API only supports 'custom'| +| evaluatorSubType | definition.type | "code" | It represents what type of evaluator It is. For #1 & #2 type evaluators, please add "code". For #3 type evaluator, please provide "prompt". For #4 type evaluator, please provide "service" | +| categories | categories | ["Quality"] | The categories of the evaluator. It's an array. Allowed values are Quality, Safety, Agents. Multiple values are allowed | +| initParameterSchema | init_parameters | | The JSON schema (Draft 2020-12) for the evaluator's input parameters. This includes parameters like type, properties, required. | +| dataMappingSchema | data_schema | | The JSON schema (Draft 2020-12) for the evaluator's input data. This includes parameters like type, properties, required. | +| outputSchema | metrics | | List of output metrics produced by this evaluator | +| path | Not expose in API | ./evaluator | Fixed. | + +Example: + +```yml + +type: "evaluator" +name: "test.bleu_score" +version: 1 +displayName: "Bleu-Score-Evaluator" +description: "| | |\n| -- | -- |\n| Score range | Float [0-1]: higher means better quality. |\n| What is this metric? | BLEU (Bilingual Evaluation Understudy) score is commonly used in natural language processing (NLP) and machine translation. It measures how closely the generated text matches the reference text. |\n| How does it work? | The BLEU score calculates the geometric mean of the precision of n-grams between the model-generated text and the reference text, with an added brevity penalty for shorter generated text. The precision is computed for unigrams, bigrams, trigrams, etc., depending on the desired BLEU score level. The more n-grams that are shared between the generated and reference texts, the higher the BLEU score. |\n| When to use it? | The recommended scenario is Natural Language Processing (NLP) tasks. It's widely used in text summarization and text generation use cases. |\n| What does it need as input? | Response, Ground Truth |\n" +evaluatorType: "builtin" +evaluatorSubType: "code" +categories: ["quality"] +initParameterSchema: + type: "object" + properties: + threshold: + type: "number" + minimum: 0 + maximum: 1 + multipleOf: 0.1 + required: ["threshold"] +dataMappingSchema: + type: "object" + properties: + ground_truth: + type: "string" + response: + type: "string" + required: ["ground_truth", "response"] +outputSchema: + bleu: + type: "continuous" + desirable_direction: "increase" + min_value: 0 + max_value: 1 +path: ./evaluator +``` + +## Step 3: Test in RAI Service ACA Code. + +Once PR is reviewed and merged, Evaluation Team will use your evaluator files to run them in ACA to make sure no errors found. You also need to provide jsonl dataset files for testing. + +## Step 4: Publish on Dev Registry (Azureml-dev) +Evaluation Team will kick off the CI Pipeline to publish evaluator in the Evaluator Catalog in azureml-dev (dev) registry. + +## Step 5: Test is INT Environment +Team will verify following: + +1. Verify if new evaluator is available in Evaluator REST APIs. +2. Verify if there are rendered correctly in NextGen UI. +3. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA. + +## Step 6: Publish on Prod Registry (Azureml) +Evaluation Team will be able to kick off the CI Pipeline again to publish evaluator in the Evaluator Catalog in azureml (prod) registry. + +## Step 7: Test is Prod Environment +Team will verify following items: + +1. Verify if new evaluator is available in Evaluator REST APIs. +2. Verify if there are rendered correctly in NextGen UI. +3. Verify if Evaluation API (Eval Run and Open AI Eval) both are able to reference these evaluators from Evaluator Catalog and run in ACA.