ERGO: Entropy-guided Resetting for Generation Optimization

• Paper Website • Quick Start • Repository Structure • Results • Contact •

Overview

ERGO introduces a paradigm shift in handling multi-turn LLM conversations by treating uncertainty as a first-class signal. When large language models get "lost" in extended conversations, ERGO detects these moments through entropy spikes and strategically resets the context, recovering both accuracy and reliability. This repository contains all code necessary to replicate our experiments and evaluate ERGO’s performance across a suite of models and multi-turn generation tasks.

Core Results

56.6%

Average Performance Gain

24.7%

Peak Capability Increase

35.3%

Decrease in Unreliability

Quick Start

Prerequisites

# Clone the repository
git clone https://github.com/haziq-exe/ERGO.git
cd ERGO
pip install -r requirements.txt

To use OpenAI models you will need the environment variable "OPENAI_KEY" to be set to your key.
You will need to downloaded the following sharded dataset from Laban et al

Basic Usage

from experiments.runExperiment import RunExperiment

# Initialize experiment with your chosen model
experiment = RunExperiment(
    model_name="HuggingFaceTB/SmolLM-135M-Instruct",
    device="cpu",
    device_map=None,
    max_new_tokens=1000
)

# Run ERGO on GSM8K dataset
experiment.run_GSM8K(
    dataset_path="sharded_dataset.json", # path to sharded dataset from Laban et al.
    num_Qs=20,
    num_runs=1,
    threshold=0.5,
    output_path="outputs/gsm8k_example.json"
)

Run from root directory:

python -m main.example_main

Repository Structure

ERGO/
│
├── evaluation/         # Evaluation metrics and scoring
│   └── evaluator.py
|   └── utils.py 
|   └── eval.bfcl.py    # Taken from Laban et al.
│
├── core/               # Core ERGO implementation
│   ├── dataset.py         
│   ├── model.py          
│   └── utils.py          
│
├── experiments/        # Experiment runner
│   └── runExperiment.py  
│
├── generation/         # Generate with ERGO
│   └── generator.py
│
└── main/              # Example scripts
    └── example_main.py

Evaluated Tasks

ERGO has been rigorously tested across five diverse generation tasks:

Task	Dataset	Description	Metric
Math	GSM8K	Elementary math word problems	Exact Match
Code	LiveCodeBench	Python function generation	Test Suite Pass
SQL	Spider	Text-to-SQL query generation	Query Accuracy
API Calls	Berkeley FCL	Function calling from instructions	Call Validity
Data-to-Text	ToTTo	Table caption generation	BLEU Score

Key Results

Average Performance Across Models

Model	FULL	SHARDED	ERGO	Relative Improvement
GPT-4o	79.2	51.4	74.1	+44.2%
GPT-4.1	83.6	56.6	77.0	+36.0%
GPT-4o-mini	73.8	44.3	71.8	+62.1%
Phi-4	64.6	36.4	59.2	+62.6%
LLaMA-3.1-8B	46.0	28.7	50.9	+77.4%

Citation

If you use ERGO in your research, please cite our paper:

@inproceedings{mohammad-khalid-etal-2025-ergo,
    title = "{ERGO}: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models",
    author = "Mohammad Khalid, Haziq  and
      Jeyaganthan, Athikash  and
      Do, Timothy  and
      Fu, Yicheng  and
      Sharma, Vasu  and
      O{'}Brien, Sean  and
      Zhu, Kevin",
    editor = "Noidea, Noidea",
    booktitle = "Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.uncertainlp-main.23/",
    pages = "273--286",
    ISBN = "979-8-89176-349-4"
}

Contact

Lead Author: Haziq Mohammad Khalid
📧 haziqkhalid04@gmail.com

Co-Author: Timothy Do
📧 tim.do.info@gmail.com

Code References

Lost in Conversation (Laban et al) — code accompanying the paper LLMs Get Lost in Multi-Turn Conversation
https://github.com/microsoft/lost_in_conversation

Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
.github/workflows		.github/workflows
READMEimg		READMEimg
core		core
evaluation		evaluation
experiments		experiments
generation		generation
main		main
tests		tests
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ERGO: Entropy-guided Resetting for Generation Optimization

Overview

Core Results

56.6%

24.7%

35.3%

Quick Start

Prerequisites

Basic Usage

Repository Structure

Evaluated Tasks

Key Results

Average Performance Across Models

Citation

Contact

Code References

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

haziq-exe/ERGO

Folders and files

Latest commit

History

Repository files navigation

ERGO: Entropy-guided Resetting for Generation Optimization

Overview

Core Results

56.6%

24.7%

35.3%

Quick Start

Prerequisites

Basic Usage

Repository Structure

Evaluated Tasks

Key Results

Average Performance Across Models

Citation

Contact

Code References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages