Sub-issue of #72 (phase 2 of the suggested sequencing).
Problem. The current meta loop is population-based search with no gradient signal at the meta level. We spend many attempts re-discovering, via random sampling, things that credit assignment over operators / mutations would surface much earlier. Over a full run this compounds into a large token cost.
Directions.
- Track per-operator and per-mutation score deltas; use them to bias future operator sampling.
- Treat the meta-config (mutation weights, exploration/exploit balance, etc.) as something we evolve with gradient signal rather than sample blindly.
- Roll out behind a flag and A/B against the current evolve loop on a fixed budget.
Open questions.
- Cheapest gradient proxy — per-operator lift vs. fuller credit assignment.
- How to handle the discrete / non-differentiable parts of the operator space.
Best taken on after #73 lands so the budget accounting under measurement is stable.
Sub-issue of #72 (phase 2 of the suggested sequencing).
Problem. The current meta loop is population-based search with no gradient signal at the meta level. We spend many attempts re-discovering, via random sampling, things that credit assignment over operators / mutations would surface much earlier. Over a full run this compounds into a large token cost.
Directions.
Open questions.
Best taken on after #73 lands so the budget accounting under measurement is stable.