Skip to content

Bugfix/evaluate mapping #42491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 72 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
4318329
Prepare evals SDK Release
May 28, 2025
192b980
Fix bug
May 28, 2025
758adb4
Fix for ADV_CONV for FDP projects
May 29, 2025
de09fd1
Update release date
May 29, 2025
ef60fe6
Merge branch 'main' into main
nagkumar91 May 29, 2025
8ca51d0
Merge branch 'Azure:main' into main
nagkumar91 May 30, 2025
98bfc3a
Merge branch 'Azure:main' into main
nagkumar91 Jun 2, 2025
a5f32e8
Merge branch 'Azure:main' into main
nagkumar91 Jun 9, 2025
5fd88b6
Merge branch 'Azure:main' into main
nagkumar91 Jun 10, 2025
51f2b44
Merge branch 'Azure:main' into main
nagkumar91 Jun 10, 2025
a5be8b5
Merge branch 'Azure:main' into main
nagkumar91 Jun 16, 2025
75965b7
Merge branch 'Azure:main' into main
nagkumar91 Jun 25, 2025
d0c5e53
Merge branch 'Azure:main' into main
nagkumar91 Jun 25, 2025
b790276
Merge branch 'Azure:main' into main
nagkumar91 Jun 26, 2025
d5ca243
Merge branch 'Azure:main' into main
nagkumar91 Jun 26, 2025
8d62e36
re-add pyrit to matrix
Jun 26, 2025
59a70f2
Change grader ids
Jun 26, 2025
4d146d7
Merge branch 'Azure:main' into main
nagkumar91 Jun 26, 2025
f7a4c83
Update unit test
Jun 27, 2025
79e3a40
replace all old grader IDs in tests
Jun 27, 2025
588cbec
Merge branch 'main' into main
nagkumar91 Jun 30, 2025
7514472
Update platform-matrix.json
nagkumar91 Jun 30, 2025
28b2513
Update test to ensure everything is mocked
Jul 1, 2025
8603e0e
tox/black fixes
Jul 1, 2025
895f226
Skip that test with issues
Jul 1, 2025
b4b2daf
Merge branch 'Azure:main' into main
nagkumar91 Jul 1, 2025
023f07f
update grader ID according to API View feedback
Jul 1, 2025
45b5f5d
Update test
Jul 2, 2025
1ccb4db
remove string check for grader ID
Jul 2, 2025
6fd9aa5
Merge branch 'Azure:main' into main
nagkumar91 Jul 2, 2025
f871855
Update changelog and officialy start freeze
Jul 2, 2025
59ac230
update the enum according to suggestions
Jul 2, 2025
794a2c4
update the changelog
Jul 2, 2025
b33363c
Finalize logic
Jul 2, 2025
464e2dd
Merge branch 'Azure:main' into main
nagkumar91 Jul 3, 2025
4585b14
Merge branch 'Azure:main' into main
nagkumar91 Jul 7, 2025
89c2988
Initial plan
Copilot Jul 7, 2025
6805018
Fix client request ID headers in azure-ai-evaluation
Copilot Jul 7, 2025
aad48df
Fix client request ID header format in rai_service.py
Copilot Jul 7, 2025
db75552
Merge pull request #5 from nagkumar91/copilot/fix-4
nagkumar91 Jul 10, 2025
b8eebf3
Merge branch 'Azure:main' into main
nagkumar91 Jul 10, 2025
2899ad4
Merge branch 'Azure:main' into main
nagkumar91 Jul 10, 2025
c431563
Merge branch 'Azure:main' into main
nagkumar91 Jul 17, 2025
79ed63c
Merge branch 'Azure:main' into main
nagkumar91 Jul 18, 2025
a3be3fc
Merge branch 'Azure:main' into main
nagkumar91 Jul 21, 2025
056ac4d
Passing threshold in AzureOpenAIScoreModelGrader
Jul 21, 2025
1779059
Add changelog
Jul 21, 2025
43fecff
Adding the self.pass_threshold instead of pass_threshold
Jul 21, 2025
b0c102b
Merge branch 'Azure:main' into main
nagkumar91 Jul 22, 2025
7bf5f1f
Add the python grader
Jul 22, 2025
3248ad0
Remove redundant test
Jul 22, 2025
d76f59b
Add class to exception list and format code
Jul 23, 2025
4d60e43
Merge branch 'main' into feature/python_grader
nagkumar91 Jul 24, 2025
98d1626
Merge branch 'Azure:main' into main
nagkumar91 Jul 24, 2025
9248c38
Add properties to evaluation upload run for FDP
Jul 24, 2025
74b760f
Remove debug
Jul 24, 2025
23dbc85
Merge branch 'feature/python_grader'
Jul 24, 2025
467ccb6
Remove the redundant property
Jul 24, 2025
c2beee8
Merge branch 'Azure:main' into main
nagkumar91 Jul 24, 2025
be9a19a
Fix changelog
Jul 24, 2025
de3a1e1
Fix the multiple features added section
Jul 24, 2025
f9faa61
removed the properties in update
Jul 24, 2025
69e783a
Merge branch 'Azure:main' into main
nagkumar91 Jul 28, 2025
8ebea2a
Merge branch 'Azure:main' into main
nagkumar91 Jul 31, 2025
3f9c818
Merge branch 'Azure:main' into main
nagkumar91 Aug 1, 2025
3b3159c
Merge branch 'Azure:main' into main
nagkumar91 Aug 5, 2025
d78b834
Merge branch 'Azure:main' into main
nagkumar91 Aug 6, 2025
ae3fc52
Merge branch 'Azure:main' into main
nagkumar91 Aug 8, 2025
706c042
Merge branch 'Azure:main' into main
nagkumar91 Aug 11, 2025
f91ee63
Merge branch 'Azure:main' into main
nagkumar91 Aug 12, 2025
4b9c2b9
update the mapping
Aug 12, 2025
1165bdf
lint fix
Aug 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1003,14 +1003,37 @@ def _preprocess_data(
input_data_df = _validate_and_load_data(
target, data, evaluators_and_graders, output_path, azure_ai_project, evaluation_name, tags
)
# Allow pre-target column mapping via evaluator_config["target"].
# Lets users map dataset columns to target params without renaming data.
if target is not None and "target" in evaluator_config:
raw_target_cfg = cast(Dict[str, Any], evaluator_config.get("target") or {})
target_mapping = cast(Dict[str, str], raw_target_cfg.get("column_mapping", raw_target_cfg))
if isinstance(target_mapping, dict) and len(target_mapping) > 0:
# Only allow ${data.*} references here
_data_ref = r"^\$\{data\.[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+)*\}$"
invalid = [v for v in target_mapping.values() if not (isinstance(v, str) and re.match(_data_ref, v))]
if invalid:
msg = "Only ${data.*} references are allowed in target " "column_mapping."
raise EvaluationException(
message=msg,
internal_message=msg,
target=ErrorTarget.EVALUATE,
category=ErrorCategory.INVALID_VALUE,
blame=ErrorBlame.USER_ERROR,
)
input_data_df = _apply_column_mapping(input_data_df, target_mapping)
if target is not None:
_validate_columns_for_target(input_data_df, target)

# extract column mapping dicts into dictionary mapping evaluator name to column mapping
# Extract evaluator name to column mapping (exclude special "target")
column_mapping = _process_column_mappings(
{
evaluator_name: evaluator_configuration.get("column_mapping", None)
for evaluator_name, evaluator_configuration in evaluator_config.items()
for (
evaluator_name,
evaluator_configuration,
) in evaluator_config.items()
if evaluator_name != "target"
}
)

Expand All @@ -1028,33 +1051,36 @@ def _preprocess_data(
batch_run_data: Union[str, os.PathLike, pd.DataFrame] = data

def get_client_type(evaluate_kwargs: Dict[str, Any]) -> Literal["run_submitter", "pf_client", "code_client"]:
"""Determines the BatchClient to use from provided kwargs (_use_run_submitter_client and _use_pf_client)"""
_use_run_submitter_client = cast(Optional[bool], kwargs.pop("_use_run_submitter_client", None))
_use_pf_client = cast(Optional[bool], kwargs.pop("_use_pf_client", None))

if _use_run_submitter_client is None and _use_pf_client is None:
"""
Determine which BatchClient to use from kwargs
(_use_run_submitter_client and _use_pf_client)
"""
use_submitter = cast(Optional[bool], kwargs.pop("_use_run_submitter_client", None))
use_pf = cast(Optional[bool], kwargs.pop("_use_pf_client", None))

if use_submitter is None and use_pf is None:
# If both are unset, return default
return "run_submitter"

if _use_run_submitter_client and _use_pf_client:
if use_submitter and use_pf:
raise EvaluationException(
message="Only one of _use_pf_client and _use_run_submitter_client should be set to True.",
message=("Only one of _use_pf_client and _use_run_submitter_client " "should be set to True."),
target=ErrorTarget.EVALUATE,
category=ErrorCategory.INVALID_VALUE,
blame=ErrorBlame.USER_ERROR,
)

if _use_run_submitter_client == False and _use_pf_client == False:
if (use_submitter is False) and (use_pf is False):
return "code_client"

if _use_run_submitter_client:
if use_submitter:
return "run_submitter"
if _use_pf_client:
if use_pf:
return "pf_client"

if _use_run_submitter_client is None and _use_pf_client == False:
if use_submitter is None and (use_pf is False):
return "run_submitter"
if _use_run_submitter_client == False and _use_pf_client is None:
if (use_submitter is False) and use_pf is None:
return "pf_client"

assert False, "This should be impossible"
Expand All @@ -1066,17 +1092,23 @@ def get_client_type(evaluate_kwargs: Dict[str, Any]) -> Literal["run_submitter",
batch_run_data = input_data_df
elif client_type == "pf_client":
batch_run_client = ProxyClient(user_agent=UserAgentSingleton().value)
# Ensure the absolute path is passed to pf.run, as relative path doesn't work with
# multiple evaluators. If the path is already absolute, abspath will return the original path.
# Ensure the absolute path is passed to pf.run, as relative path
# doesn't work with multiple evaluators. If the path is already
# absolute, abspath will return the original path.
batch_run_data = os.path.abspath(data)
elif client_type == "code_client":
batch_run_client = CodeClient()
batch_run_data = input_data_df

# If target is set, apply 1-1 column mapping from target outputs to evaluator inputs
# If target is set, map target outputs to evaluator inputs (1-1)
if data is not None and target is not None:
input_data_df, target_generated_columns, target_run = _apply_target_to_data(
target, batch_run_data, batch_run_client, input_data_df, evaluation_name, **kwargs
target,
batch_run_data,
batch_run_client,
input_data_df,
evaluation_name,
**kwargs,
)

# IMPORTANT FIX: For ProxyClient, create a temporary file with the complete dataframe
Expand All @@ -1099,7 +1131,12 @@ def get_client_type(evaluate_kwargs: Dict[str, Any]) -> Literal["run_submitter",
target_reference = f"${{data.{Prefixes.TSG_OUTPUTS}{col}}}"

# We will add our mapping only if customer did not map target output.
if col not in mapping and target_reference not in mapped_to_values:
run_outputs_reference = f"${{run.outputs.{col}}}"
if (
col not in mapping
and target_reference not in mapped_to_values
and run_outputs_reference not in mapped_to_values
):
column_mapping[evaluator_name][col] = target_reference

# Don't pass the target_run since we're now using the complete dataframe
Expand All @@ -1121,7 +1158,12 @@ def get_client_type(evaluate_kwargs: Dict[str, Any]) -> Literal["run_submitter",
target_reference = f"${{data.{Prefixes.TSG_OUTPUTS}{col}}}"

# We will add our mapping only if customer did not map target output.
if col not in mapping and target_reference not in mapped_to_values:
run_outputs_reference = f"${{run.outputs.{col}}}"
if (
col not in mapping
and target_reference not in mapped_to_values
and run_outputs_reference not in mapped_to_values
):
column_mapping[evaluator_name][col] = target_reference

# After we have generated all columns, we can check if we have everything we need for evaluators.
Expand Down