AI-Hypercomputer
diff --git a/‎MaxText/experimental/agent/code_evaluation_agent/README.md
Lines changed: 70 additions & 0 deletions b/‎MaxText/experimental/agent/code_evaluation_agent/README.md
Lines changed: 70 additions & 0 deletions
diff --git a/‎MaxText/experimental/agent/code_evaluation_agent/code_evaluation_agent.py
Lines changed: 206 additions & 0 deletions b/‎MaxText/experimental/agent/code_evaluation_agent/code_evaluation_agent.py
Lines changed: 206 additions & 0 deletions
diff --git a/‎MaxText/experimental/agent/code_evaluation_agent/prompt_code_evaluation.py
Lines changed: 67 additions & 0 deletions b/‎MaxText/experimental/agent/code_evaluation_agent/prompt_code_evaluation.py
Lines changed: 67 additions & 0 deletions
@@ -0,0 +1,70 @@
+# Code Evaluation Agent
+
+This agent automates the evaluation of JAX code that has been converted from PyTorch. It works by generating and executing `pytest` test cases to compare the functional equivalence of the original PyTorch code and the converted JAX code. The agent leverages a large language model (Gemini) to create these test cases dynamically.
+
+## Workflow
+
+1.  **File Pairing**: The agent identifies pairs of corresponding PyTorch and JAX files from specified input directories.
+2.  **Test Case Generation**: For each file pair, it prompts the Gemini model to generate a comprehensive `pytest` test case. The generated test compares the outputs of the PyTorch and JAX modules using randomized inputs to ensure they are numerically close (`numpy.allclose`).
+3.  **Test Execution**: The generated test case is saved as a Python file and executed using `pytest`.
+4.  **Result Aggregation**: The agent captures the results (pass/fail counts) from each test run.
+5.  **Reporting**: Finally, it calculates and logs two key metrics:
+    *   **Test Case Accuracy**: The percentage of individual test cases that passed across all files.
+    *   **File Accuracy**: The percentage of files for which all generated test cases passed.
+
+## File Descriptions
+
+-   **`code_evaluation_agent.py`**: The main executable script that orchestrates the entire evaluation process.
+-   **`prompt_code_evaluation.py`**: Contains the system and user prompt templates that instruct the Gemini model on how to generate the `pytest` test cases.
+-   **`utils.py`**: Provides helper functions, including `run_pytest_capture_output` to execute `pytest` and capture its results, and `get_last_defined_module` to identify the primary component in a code file.
+
+## Setup
+
+1.  **Install Dependencies**:
+    Make sure you have the required Python packages installed.
+    ```bash
+    pip install pytest google-generativeai backoff python-dotenv
+    ```
+
+2.  **Configure Environment Variables**:
+    This agent uses the `GeminiAgent` from the `code_generation_agent`, which requires a `.env` file in the `code_generation_agent` directory.
+
+    ```.env
+    # in MaxText/experimental/agent/code_generation_agent/.env
+    GOOGLE_API_KEY="YOUR_API_KEY_HERE"
+    Model="gemini-2.5-pro"
+    ```
+
+3.  **Configure Paths**:
+    In `code_evaluation_agent.py`, set the following path variables to point to your datasets. The script will create the test case directory if it doesn't exist. You can modify the paths as needed.
+
+    ```python
+    # in code_evaluation_agent.py
+    pytorch_path="../code_generation_agent/dataset/PyTorch/"
+    jax_path="../code_generation_agent/dataset/jax_converted/"
+    testcase_path="../code_generation_agent/dataset/test_cases/"
+    ```
+
+## Usage
+
+Before running the agent, ensure you have:
+
+1.  Your original PyTorch files in the directory specified by `pytorch_path`.
+2.  The corresponding converted JAX files in the directory specified by `jax_path`. The filenames must match between the two directories.
+
+To start the evaluation process, run the following command from within the `code_evaluation_agent` directory:
+
+```bash
+python code_evaluation_agent.py
+```
+
+The agent will process each file pair, generate tests, run them, and print the progress and final accuracy metrics to the console.
+
+## Output
+
+The agent provides real-time logging for each file being processed. At the end of the run, it prints a summary of the results, including:
+
+- The number of files that passed all tests.
+- The number of files that had at least one failing test.
+- The overall **Test Case Accuracy**.
+- The overall **File Accuracy**.
@@ -0,0 +1,206 @@
+"""
+Copyright 2025 Google LLC
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+     https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+"""
+This file implements an agent that evaluates the correctness of JAX code
+generated from PyTorch code by running pytest test cases. It uses a language
+model to generate the test cases and captures the results of the tests.
+
+The agent performs the following steps:
+1. Reads pairs of PyTorch and JAX files from specified directories.
+2. For each pair, it generates a pytest-compatible test case using a language
+   model.
+3. It runs the generated test case and captures the output, including the number
+   of passed and failed tests.
+4. It logs the results and calculates overall accuracy metrics.
+
+Example Invocation:
+python code_evaluation_agent.py
+
+Ensure the paths to the PyTorch and JAX code directories are correctly set in
+the script. The script will create a directory for test cases if it doesn't
+exist and will overwrite existing test cases based on the `overwrite_existing_files`
+flag.
+
+Overall Accuracy Metrics:
+- Test Case Accuracy: The percentage of individual test cases that passed across
+  all generated tests.
+- File Accuracy: The percentage of files for which all generated test cases passed.
+
+Relevant Files:
+- `prompt_code_evaluation.py`: Contains the prompts used by the language model
+  for generating test cases.
+- `utils.py`: Provides utility functions such as `get_last_defined_module`
+  (to extract the main module from a code string) and `run_pytest_capture_output`
+  (to execute pytest and capture its results).
+- `code_generation_agent/llm_agent.py`: Contains the `GeminiAgent` class used
+  to interact with the language model.
+- `orchestration_agent/Utils.py`: Contains `parse_python_code` for extracting
+  code from LLM responses.
+"""
+import argparse
+import os, logging, sys
+from prompt_code_evaluation import CodeEvaluation
+from utils import get_last_defined_module, run_pytest_capture_output
+
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+
+from code_generation_agent.llm_agent import GeminiAgent
+from orchestration_agent.Utils import parse_python_code
+
+logging.basicConfig(
+    format="%(asctime)s %(levelname)-8s %(message)s",
+    level=logging.INFO,
+    datefmt="%Y-%m-%d %H:%M:%S",
+)
+logger = logging.getLogger(__name__)
+# logging.raiseExceptions = False
+
+
+parser = argparse.ArgumentParser(description="Code Evaluation Agent")
+parser.add_argument("--error_penalty", type=int, default=10, help="Penalty for errors in test case generation or execution.")
+parser.add_argument("--pytorch_path", type=str, default="../code_generation_agent/dataset/PyTorch/", help="Path to the directory containing PyTorch files.")
+parser.add_argument("--jax_path", type=str, default="../code_generation_agent/dataset/jax_converted/", help="Path to the directory containing JAX files.")
+parser.add_argument("--testcase_path", type=str, default="../code_generation_agent/dataset/test_cases/", help="Path to the directory for generated test cases.")
+parser.add_argument("--overwrite_existing_files", action="store_true", help="Overwrite existing test case files.")
+args = parser.parse_args()
+
+overwrite_existing_files = args.overwrite_existing_files
+error_penalty = args.error_penalty
+pytorch_path = args.pytorch_path
+jax_path = args.jax_path
+testcase_path = args.testcase_path
+os.makedirs(testcase_path, exist_ok=True)
+
+llm_agent = GeminiAgent(CodeEvaluation["SystemPrompt"])
+
+
+def get_file_pairs(pytorch_path, jax_path):
+  """Generates lists of file paths for PyTorch and JAX files that have a common name.
+
+  This function finds files with the same name in the specified PyTorch and JAX
+  directories, filtering out any files in the JAX directory that start with "__".
+
+  Args:
+      pytorch_path: The path to the directory containing PyTorch files.
+      jax_path: The path to the directory containing JAX files.
+
+  Returns:
+      A tuple containing two lists of strings:
+          - The first list contains the full paths to the common PyTorch files.
+          - The second list contains the full paths to the common JAX files.
+  """
+  pytorch_files = os.listdir(pytorch_path)
+  jax_files = list(filter(lambda x: not x.startswith("__"), os.listdir(jax_path)))
+  common_files = list(set(pytorch_files).intersection(jax_files))
+  return list(map(lambda x: pytorch_path + x, common_files)), list(map(lambda x: jax_path + x, common_files))
+
+
+def make_test_case_and_run(python_file, jax_file):
+  """Generates a test case and runs it for a given PyTorch and JAX file pair.
+
+  This function uses a language model to generate a pytest-compatible test case
+  for a PyTorch and JAX code file pair. It then runs the test and captures the output.
+  If the files have inconsistent entry points or the test case cannot be generated,
+  a penalty is applied.
+
+  Args:
+      python_file: The path to the PyTorch code file.
+      jax_file: The path to the JAX code file.
+
+  Returns:
+      A tuple containing the number of passed and failed test cases.
+  """
+  try:
+    logger.info(f"Processing {python_file}")
+    out_file_path = os.path.join(testcase_path, python_file.split("/")[-1])
+    if overwrite_existing_files or not os.path.exists(out_file_path):
+      with open(python_file) as f:
+        python_code = f.read()
+      with open(jax_file) as f:
+        jax_code = f.read()
+      entry_module = get_last_defined_module(python_code)
+      if get_last_defined_module(jax_code) != entry_module:
+        logger.error(
+            f"It seems inconsistency in {python_file} code PyTorch have {entry_module} and JAX have {get_last_defined_module(jax_code)} as entry Module"
+        )
+        # Penalty in case of Entry point not exist or different from torch
+        return 0, error_penalty
+      prompt = CodeEvaluation["TESTCASE"]
+      python_code = (
+          "from " + ".".join(python_file.split("/")[1:]).replace(".py", " import " + entry_module) + "\n\n" + python_code
+      )
+      jax_code = "from " + ".".join(jax_file.split("/")[1:]).replace(".py", " import " + entry_module) + "\n\n" + jax_code
+      prompt = prompt.replace("<module.path.to.pytorch_code>", python_code)
+      prompt = prompt.replace("<module.path.to.jax_code>", jax_code)
+      prompt = prompt.replace("<function_or_class_to_call>", entry_module)
+      response = llm_agent(prompt)
+      generated_code = parse_python_code(response.text)
+      with open(out_file_path, "w") as f:
+        f.write("import os,sys\nsys.path.append(os.path.abspath('..'))\n")
+        f.write(generated_code)
+      logger.info("Written at %s", out_file_path)
+      if "<UNABLETOGENERATE>" in response:
+        return 0, error_penalty
+    else:
+      logger.info("File Exists using same")
+    file = python_file.split("/")[-1]
+    output, exit_code, is_dependency_error, passed, failed = run_pytest_capture_output(file, code_folder=testcase_path)
+    return passed, failed
+  except Exception as e:
+    logger.error("Exception in code generation %s", e)
+    logger.error("The code file is %s", python_file.split("/")[-1])
+    logger.error("The generated Code is %s", response)
+    # Penalty in case of Exception
+    return 0, error_penalty
+
+
+def run_code_evaluation():
+  """Runs the full code evaluation process.
+
+  This function orchestrates the evaluation of PyTorch and JAX code file pairs.
+  It iterates through the common files, generates and runs a test case for each,
+  and then logs the results. It also calculates and prints the overall
+  test case and file accuracy.
+  """
+  total_passed, total_failed = 0, 0
+  all_passed, all_failed, total_files = 0, 0, 0
+  for python_file, jax_file in zip(*get_file_pairs(pytorch_path, jax_path)):
+    num_passed, num_failed = make_test_case_and_run(python_file, jax_file)
+    if num_passed == num_failed == 0: # when the code cannot be executed
+      # Penalty in case of issue in test case and not executed
+      num_failed = error_penalty
+    logger.info(f"{python_file.split('/')[-1]} have {num_passed} cases passed and {num_failed} cases failed")
+    total_passed += num_passed
+    total_failed += num_failed
+    if num_passed == 0:
+      all_failed += 1
+    if num_failed == 0:
+      all_passed += 1
+    total_files += 1
+
+  logger.info("****** Results ******")
+  logger.info(f"{all_passed} files have all module passed {all_failed} files have all module failed")
+  logger.info(
+      f"Test case Accuracy {total_passed*100/(total_passed+total_failed):.2f}%",
+  )
+  logger.info(
+      f"File Accuracy {all_passed * 100 / total_files:.2f}%",
+  )
+
+
+if __name__ == "__main__":
+  run_code_evaluation()
@@ -0,0 +1,67 @@
+"""
+Copyright 2025 Google LLC
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+     https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+"""
+This file contains the prompt templates used by the code evaluation agent.
+"""
+
+CodeEvaluation = {
+    "SystemPrompt": """You are an expert machine learning engineer and automated testing specialist with deep
+   knowledge of Python, NumPy, PyTorch, JAX (Including libraries such as Flax, Flax.nnx and Optax).
+
+  You can:
+  - Convert code written in PyTorch, Numpy, or other frameworks into functionally equivalent JAX code using appropriate libraries.
+  - Analyze JAX-based code and generate meaningful testcases using `pytest`.
+  - When both PyTorch and JAX modules are provided, generate a comprehensive test suite that:
+  1. validates the PyTorch module independently.
+  2. validates the JAX module independently.
+  3. Compares their outputs across multiple randomized inputs using `numpy.allclose`.
+
+  Guidelines:
+  - Assume helper functions and classes not defined in the code are already implemented and available.
+  - Do not add or modify import statements unless they exist in the provided code.
+  - Only return test code (no explanations) unless explicitly asked.
+  - For trivial or untestable code, return `NOTESTCASE`.
+  - When comparing PyTorch and JAX:
+    - Accept `#torch_path` and `#jax_path` as import paths.
+    - Accept an optional `#entry_point` that identifies the function or class to invoke.
+    - Automatically generate randomized test inputs for shapes like `(2,3)`, `(4,)`, etc.
+    - Write clear assertions for:
+        - Output validity (no errors or exceptions)
+        - Output comparison (`np.allclose`)
+    """,
+    "TESTCASE": """#torch_path
+    <module.path.to.pytorch_code>
+
+    #jax_path
+    <module.path.to.jax_code>
+
+    #entry_point
+    <function_or_class_to_call>
+
+    #input_gen
+    <code to generate input tensors or arrays>
+
+    #torch_code
+    '''
+    <insert full PyTorch code here>
+    '''
+
+    #jax_code
+    '''
+    <insert full JAX code here>
+    '''""",
+}