Agentic task generation #37

kohankhaki · 2025-08-26T17:50:38Z

PR Type

Feature

Short Description

This PR adds a new agentic system for task generation. It also introduces a structured prompt/response contract (JSON) to include thoughts and integrates Langfuse for logging LLM outputs and key events.

Tests Added

None

This change is

…tputs, and updated corresponding output parser.

afkanpour

@afkanpour reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @kohankhaki)

src/utils/agentic_prompts.py line 209 at r1 (raw file):

Please return your proposal and your thoughts and reasoning in the following format:
{{
  "thought": "Your reasoning and thought process about the kind of tasks you're proposing",

"Thought": "Your reasoning and thought process for designing the tasks and ensuring diversity in content and difficulty of tasks"

Code quote:

"Your reasoning and thought process about the kind of tasks you're proposing"

src/utils/agentic_prompts.py line 211 at r1 (raw file):

  "thought": "Your reasoning and thought process about the kind of tasks you're proposing",
  "problems": {{
    "problem_0": "TASK_TEXT_1",

These could be replaced with "PROBLEM_1_DESCRIPTION"

Code quote:

TASK_TEXT_1

src/utils/agentic_prompts.py line 242 at r1 (raw file):

    "solution_1": "SOLUTION_TEXT_2",
    ...
  }}

We should give one problem at a time for solving. So I expect the solution json will contain only one solution.

We should add a sentence to the prompt asking for the final numerical solution, so parsing and verification becomes easy.

Code quote:

  "solutions": {{
    "solution_0": "SOLUTION_TEXT_1",
    "solution_1": "SOLUTION_TEXT_2",
    ...
  }}

kohankhaki

Reviewable status: 0 of 27 files reviewed, 3 unresolved discussions (waiting on @afkanpour)

src/utils/agentic_prompts.py line 209 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

"Thought": "Your reasoning and thought process for designing the tasks and ensuring diversity in content and difficulty of tasks"

Done.

src/utils/agentic_prompts.py line 211 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

These could be replaced with "PROBLEM_1_DESCRIPTION"

Done.

src/utils/agentic_prompts.py line 242 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

We should give one problem at a time for solving. So I expect the solution json will contain only one solution.

We should add a sentence to the prompt asking for the final numerical solution, so parsing and verification becomes easy.

Done.

afkanpour

@afkanpour reviewed 27 of 27 files at r2.
Reviewable status: 17 of 27 files reviewed, 9 unresolved discussions (waiting on @kohankhaki)

src/task_solver/moderator.py line 49 at r2 (raw file):

        num_solvers: int,
        max_rounds: int,
        output_dir: Path,

Please add a description for the class attributes in the docstring.

Code quote:

        model_client: ChatCompletionClient,
        num_solvers: int,
        max_rounds: int,
        output_dir: Path,

src/cfg/agentic_config.yaml line 11 at r2 (raw file):

# Debate configuration (shared across all stages)
debate_cfg:
  max_round: 5

Is this the number of rounds of debate between two agents? Isn't 5 too large?

Code quote:

src/utils/agentic_prompts.py line 285 at r2 (raw file):

Provide your solution in JSON format with the following structure:
- thought: Your detailed reasoning and step-by-step solution process
- final_answer: Your complete answer with explanation

Should we remove the requirement for 'explanation' in the final answer?

Code quote:

with explanation

src/utils/agentic_prompts.py line 286 at r2 (raw file):

- thought: Your detailed reasoning and step-by-step solution process
- final_answer: Your complete answer with explanation
- numerical_answer: The final numerical result (if applicable, otherwise null)

having both final_answer and numerical_answer could be confusing.
I suggest we provide only one field for the final solution

Code quote:

numerical_answer:

src/cfg/agentic_config.yaml line 7 at r2 (raw file):

global_cfg:
  domain: math
  output_dir: /fs01/projects/aieng/public/ace/agentic_outputs/

Curious where this is specified?

Code quote:

/fs01/projects/aieng/public/ace/

src/task_solver/generator.py line 66 at r2 (raw file):

                        seed=cfg.agents.moderator.get("seed"),
                    ),
                    num_solvers=2,

Can this be specified in the config? How hard is it to change the logic to work with >2 solvers?

Code quote:

README.md line 89 at r2 (raw file):

# Generate tasks for each capability
python -m src.agentic_task_generator

Where is the capability for which task are to be generated specified? Please add a comment for that in the README.

Code quote:

agentic_task_generator

README.md line 92 at r2 (raw file):

# Generate tasks for all capabilities
python -m src.agentic_task_generator pipeline_tags.capabilities_tag=_20250902_030203

Is this tag auto-generated by a previous job (for example, capability generator)? Please explain in the README how this tag should be specified.

In general the README file should provide sufficient information for running all steps easily by someone unfamiliar with the codebase.

Code quote:

_20250902_030203

README.md line 95 at r2 (raw file):

# Generate solutions for tasks using multi-agent debate
python -m src.agentic_task_solver pipeline_tags.tasks_tag=_20250905_153532

ditto

Code quote:

_20250905_153532

kohankhaki · 2025-11-04T23:24:28Z

src/cfg/agentic_config.yaml line 7 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Curious where this is specified?

/fs01/projects/aieng/public/ace/ needs to be set in output_dir. I intentionally set it to agentic_outputs/, so if someone is new to the repo, do not make any changes to our primary storage.

kohankhaki · 2025-11-04T23:24:46Z

src/cfg/agentic_config.yaml line 11 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Is this the number of rounds of debate between two agents? Isn't 5 too large?

This is just a place holder for now. We can change these in the experiments. That said, with 3, agents did not reach consensus.

kohankhaki · 2025-11-04T23:25:02Z

src/task_solver/generator.py line 66 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Can this be specified in the config? How hard is it to change the logic to work with >2 solvers?

It is not easy. Needs lots of refactoring.

…peline.

kohankhaki · 2025-11-04T23:27:05Z

src/utils/agentic_prompts.py line 285 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Should we remove the requirement for 'explanation' in the final answer?

Not sure about this. I guess we need to run experiments to finalize these details.

kohankhaki · 2025-11-04T23:28:22Z

src/utils/agentic_prompts.py line 286 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

having both final_answer and numerical_answer could be confusing.
I suggest we provide only one field for the final solution

Not having that had also its own complications. This way it is easier to evaluate the end result if it is numerical. I'd say let's modify these details later on, when we run experiments and find the best setting.

kohankhaki

Reviewable status: 17 of 27 files reviewed, 9 unresolved discussions (waiting on @afkanpour)

README.md line 89 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Where is the capability for which task are to be generated specified? Please add a comment for that in the README.

Done.

README.md line 92 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Is this tag auto-generated by a previous job (for example, capability generator)? Please explain in the README how this tag should be specified.

In general the README file should provide sufficient information for running all steps easily by someone unfamiliar with the codebase.

Done.

README.md line 95 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

ditto

Done.

src/task_solver/moderator.py line 49 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Please add a description for the class attributes in the docstring.

Done.

afkanpour

@afkanpour reviewed 7 of 10 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @kohankhaki)

src/cfg/agentic_config.yaml line 7 at r2 (raw file):

Previously, kohankhaki (Farnaz Kohankhaki) wrote…

/fs01/projects/aieng/public/ace/ needs to be set in output_dir. I intentionally set it to agentic_outputs/, so if someone is new to the repo, do not make any changes to our primary storage.

So if someone wants to run the pipeline, should they add /fs01/projects/aieng/public/ace/ to the config file? If so, I suggest we simply hard-code it there for now to make the runs easier for everyone. If different paths have to be specified in the config, let's have it as a base_dir or root_dir somewhere in the config and then in the code append it to paths.

src/task_solver/generator.py line 66 at r2 (raw file):

Previously, kohankhaki (Farnaz Kohankhaki) wrote…

It is not easy. Needs lots of refactoring.

OK

kohankhaki · 2025-11-07T20:50:21Z

@afkanpour Added a comment insrc/cfg/agentic_config.yaml regarding the output_dir.

afkanpour

Reviewable status: 26 of 27 files reviewed, 2 unresolved discussions (waiting on @kohankhaki)

adding refactored task generation. updated prompts to ask for json ou…

f0cf760

…tputs, and updated corresponding output parser.

kohankhaki requested a review from afkanpour August 26, 2025 17:50

kohankhaki closed this Aug 26, 2025

kohankhaki reopened this Aug 26, 2025

afkanpour requested changes Aug 27, 2025

View reviewed changes

kohankhaki added 11 commits September 5, 2025 01:03

fixed retry, json processing, and max token.

06da910

Merge branch 'fix-anthropic-client' into agentic_task_gen

70d7b06

switichin to two phase task generation.

0ca1c22

switichin to two phase task generation. part 2.

396feac

updated agentic config and readme.

b166e4c

simplified task generations.

084b68c

simplified task generation.

c155d74

fixed mypy errors.

52b4d2a

ruff fix.

d1e1812

updated saved file name for solutions.

4d237f7

added extra details to agent solution messages.

38d825d

kohankhaki commented Oct 9, 2025

View reviewed changes

kohankhaki added 5 commits October 9, 2025 12:20

fixed prompts.

c5afb81

fixed output dir name to include area name.

9195b93

fixed task solver output dir name.

57d2d2a

upgraded json handling, and model call.

3292299

updated readme to include latest agentic changes.

df4860b

afkanpour requested changes Oct 31, 2025

View reviewed changes

fixed readme to include info on output and input paths for agentic pi…

3ca8acf

…peline.

added attribute desc for docstrings for task solver classes.

4ef564f

kohankhaki commented Nov 4, 2025

View reviewed changes

afkanpour requested changes Nov 7, 2025

View reviewed changes

added a comment on output_dir.

5687cb4

afkanpour approved these changes Nov 7, 2025

View reviewed changes

Merge branch 'main' into agentic_task_gen

a379ecc

kohankhaki merged commit eea799f into main Nov 7, 2025
1 of 2 checks passed

Agentic task generation #37

Agentic task generation #37

Uh oh!

Conversation

kohankhaki commented Aug 26, 2025 • edited by afkanpour Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Short Description

Tests Added

Uh oh!

afkanpour left a comment

Choose a reason for hiding this comment

Uh oh!

kohankhaki left a comment

Choose a reason for hiding this comment

Uh oh!

afkanpour left a comment

Choose a reason for hiding this comment

Uh oh!

kohankhaki commented Nov 4, 2025

Uh oh!

kohankhaki commented Nov 4, 2025

Uh oh!

kohankhaki commented Nov 4, 2025

Uh oh!

kohankhaki commented Nov 4, 2025

Uh oh!

kohankhaki commented Nov 4, 2025

Uh oh!

kohankhaki left a comment

Choose a reason for hiding this comment

Uh oh!

afkanpour left a comment

Choose a reason for hiding this comment

Uh oh!

kohankhaki commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afkanpour left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kohankhaki commented Aug 26, 2025 •

edited by afkanpour

Loading

kohankhaki commented Nov 7, 2025 •

edited

Loading