Do Erdos rollouts have access to the parent construction during evaluation?

Dear authors, thank you for releasing the TTT-Discover code. I've been studying it while working on a reimplementation.

For the Erdos minimum overlap task, I noticed that `ErdosMinOverlapRewardEvaluator` (in `examples/erdos_min_overlap/env.py`) does not set `self.verifier_src`. The base class `SandboxRewardEvaluator` defaults it to `None` ([line 437 of `environments/sandbox_reward_evaluator.py`](https://github.com/test-time-training/discover/blob/e9657a10521df1e2c0a087b29a73f521182fa8e3/ttt_discover/environments/sandbox_reward_evaluator.py#L437)).

This means in the [Erdos `env.py`](https://github.com/test-time-training/discover/blob/e9657a10521df1e2c0a087b29a73f521182fa8e3/ttt_discover/environments/sandbox_reward_evaluator.py#L437), `preprocess_generation` returns the raw generated code unchanged:

```python
def preprocess_generation(self, generation, state) -> str:
    import inspect
    if self.verifier_src is None:
        return generation          # <-- always takes this path for Erdos

    # then the code doesn't go here?
    verifier_src = inspect.getsource(self.verifier_src)
    numpy_import = "import numpy as np"
        
    base = numpy_import + "\n\n" + verifier_src + "\n\n"

    ...
    if state.construction is not None:
        initial_h_values = f"initial_h_values = np.array({list(state.construction)!r})"
        base += initial_h_values + "\n\n"
    return base + generation
```

As a result, `initial_h_values`, `verify_c5_solution`, and the parent construction are never given to the execution environment for Erdos rollouts. However, the prompt (in `get_question`) tells the model:

> `evaluate_erdos_solution()` and `initial_h_values` (an initial construction, if available) are pre-imported

([Source](https://github.com/test-time-training/discover/blame/e9657a10521df1e2c0a087b29a73f521182fa8e3/examples/erdos_min_overlap/env.py#L186))

The AC inequalities env does set `self.verifier_src` (to `evaluate_sequence_ac1` / `evaluate_sequence_ac2`), so the injection works there.

Could you clarify whether this is intentional for Erdos? For example:
- Was there a separate Erdos evaluator used in your production runs that did set `verifier_src`?
- Or does the Erdos task intentionally require the model to be fully self-contained (import its own numpy, define its own verifier, generate its own starting construction)?

I want to make sure my reimplementation matches your actual experimental setup. Any guidance would be appreciated. Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do Erdos rollouts have access to the parent construction during evaluation? #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Do Erdos rollouts have access to the parent construction during evaluation? #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions