Dear authors, thank you for releasing the TTT-Discover code. I've been studying it while working on a reimplementation.
For the Erdos minimum overlap task, I noticed that ErdosMinOverlapRewardEvaluator (in examples/erdos_min_overlap/env.py) does not set self.verifier_src. The base class SandboxRewardEvaluator defaults it to None (line 437 of environments/sandbox_reward_evaluator.py).
This means in the Erdos env.py, preprocess_generation returns the raw generated code unchanged:
def preprocess_generation(self, generation, state) -> str:
import inspect
if self.verifier_src is None:
return generation # <-- always takes this path for Erdos
# then the code doesn't go here?
verifier_src = inspect.getsource(self.verifier_src)
numpy_import = "import numpy as np"
base = numpy_import + "\n\n" + verifier_src + "\n\n"
...
if state.construction is not None:
initial_h_values = f"initial_h_values = np.array({list(state.construction)!r})"
base += initial_h_values + "\n\n"
return base + generation
As a result, initial_h_values, verify_c5_solution, and the parent construction are never given to the execution environment for Erdos rollouts. However, the prompt (in get_question) tells the model:
evaluate_erdos_solution() and initial_h_values (an initial construction, if available) are pre-imported
(Source)
The AC inequalities env does set self.verifier_src (to evaluate_sequence_ac1 / evaluate_sequence_ac2), so the injection works there.
Could you clarify whether this is intentional for Erdos? For example:
- Was there a separate Erdos evaluator used in your production runs that did set
verifier_src?
- Or does the Erdos task intentionally require the model to be fully self-contained (import its own numpy, define its own verifier, generate its own starting construction)?
I want to make sure my reimplementation matches your actual experimental setup. Any guidance would be appreciated. Thanks!
Dear authors, thank you for releasing the TTT-Discover code. I've been studying it while working on a reimplementation.
For the Erdos minimum overlap task, I noticed that
ErdosMinOverlapRewardEvaluator(inexamples/erdos_min_overlap/env.py) does not setself.verifier_src. The base classSandboxRewardEvaluatordefaults it toNone(line 437 ofenvironments/sandbox_reward_evaluator.py).This means in the Erdos
env.py,preprocess_generationreturns the raw generated code unchanged:As a result,
initial_h_values,verify_c5_solution, and the parent construction are never given to the execution environment for Erdos rollouts. However, the prompt (inget_question) tells the model:(Source)
The AC inequalities env does set
self.verifier_src(toevaluate_sequence_ac1/evaluate_sequence_ac2), so the injection works there.Could you clarify whether this is intentional for Erdos? For example:
verifier_src?I want to make sure my reimplementation matches your actual experimental setup. Any guidance would be appreciated. Thanks!