Overnight autoresearch adaptation beats TinyRecursiveModels on Sudoku-Extreme: 92.2% exact acc in 5 min (vs paper's 87% in 18h) #369
VihariKanukollu
started this conversation in
Show and tell
Replies: 2 comments
-
|
Impressive results, nice work! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
excellent. yes : i would like to test your modified train.py and prepare.py and get back to you - great interest. I am not sure how code should be shared here, but you could email it to diestel.research@gmail.com; confidentially and your ip personally guaranteed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I adapted the autoresearch loop to run on TinyRecursiveModels by @jm_alexia, targeting the Sudoku-Extreme benchmark.
Results: 92.2% exact accuracy in 5 minutes of training (274 total runs, 263 metric runs), compared to the paper's 87% in 18 hours. The agent found five key interventions over the course of the run:
The most interesting finding was #5: the agent deliberately weakened the input reinjection path so the model had to rely more on its recurrent state, which is the opposite of the original TRM design choice.
Progress chart (annotated with breakthrough points):
Full thread with details: https://x.com/VihariKanukollu/status/2035411680050778435
This is an adaptation of the autoresearch seed repo applied to a non-nanochat benchmark — happy to share the modified train.py and prepare.py if there's interest.
Beta Was this translation helpful? Give feedback.
All reactions