Skip to content

Do Erdos rollouts have access to the parent construction during evaluation? #15

@cheongalc

Description

@cheongalc

Dear authors, thank you for releasing the TTT-Discover code. I've been studying it while working on a reimplementation.

For the Erdos minimum overlap task, I noticed that ErdosMinOverlapRewardEvaluator (in examples/erdos_min_overlap/env.py) does not set self.verifier_src. The base class SandboxRewardEvaluator defaults it to None (line 437 of environments/sandbox_reward_evaluator.py).

This means in the Erdos env.py, preprocess_generation returns the raw generated code unchanged:

def preprocess_generation(self, generation, state) -> str:
    import inspect
    if self.verifier_src is None:
        return generation          # <-- always takes this path for Erdos

    # then the code doesn't go here?
    verifier_src = inspect.getsource(self.verifier_src)
    numpy_import = "import numpy as np"
        
    base = numpy_import + "\n\n" + verifier_src + "\n\n"

    ...
    if state.construction is not None:
        initial_h_values = f"initial_h_values = np.array({list(state.construction)!r})"
        base += initial_h_values + "\n\n"
    return base + generation

As a result, initial_h_values, verify_c5_solution, and the parent construction are never given to the execution environment for Erdos rollouts. However, the prompt (in get_question) tells the model:

evaluate_erdos_solution() and initial_h_values (an initial construction, if available) are pre-imported

(Source)

The AC inequalities env does set self.verifier_src (to evaluate_sequence_ac1 / evaluate_sequence_ac2), so the injection works there.

Could you clarify whether this is intentional for Erdos? For example:

  • Was there a separate Erdos evaluator used in your production runs that did set verifier_src?
  • Or does the Erdos task intentionally require the model to be fully self-contained (import its own numpy, define its own verifier, generate its own starting construction)?

I want to make sure my reimplementation matches your actual experimental setup. Any guidance would be appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions