Skip to content

alexander-brady/llm-flow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Flow

Lightweight framework for building multi-step pipelines, called flows, powered by large language models. Each step can answer prompts, classify from predefined options, or build on previous results to create complex multi-step tasks. Designed for simplicity and scalability, LLM Flow makes it easy to prototype, iterate, and deploy simple yet intelligent workflows on large datasets.

Installation

  1. We recommend using uv to manage your Python environment and download dependencies:
uv sync
  1. Download tabular data files to the data/ directory. Data must be Parquet or CSV format.

Configuration

This project uses Hydra for configuration management. These configurations define not just the model and data parameters, but also the entire pipeline behavior.

  • Default configs live in the configs/ directory.
  • You can override any parameter from the command line.
  • For more details, see the documentation in the respective configs directories.

Example:

python -m llm_flow \
    model.name=openai/gpt-oss-120b \
    model.temperature=0.5 \
    data_dir=data/processed \
    flow=default

Designing Flows

For detailed documentation, see configs/flow.

Flows define multi-step interactions, written in simple YAML. Each step builds on previous outputs, allowing you to design clear, structured LLM workflows.

Flows are written in the configs/flow folder. You can override the current prompt flow run via config or Hydra CLI.

Data Format

For our original experiments, we used news articles from the GNews API.

Place your data in the data/ directory (or override data_dir via config/CLI). Data must be Parquet or CSV format. Each file within the directory will be treated as part of the dataset.

Quickstart

See run.sh for an example SLURM job script, designed for the ETH Zurich Euler cluster.

  1. Prepare your data in data/

  2. Adjust parameters and prompts in configs/config.yaml if needed.

  3. Run:

    python -m llm_flow
  4. Results are saved automatically to the Hydra run directory (default: outputs/YYYY-MM-DD/HH-MM-SS-<SLURM_JOB_ID>/).

Outputs

Each run produces:

  • results.csv — classification results with reasoning traces
  • Hydra config snapshot (.hydra/) for reproducibility
  • Log file (__main__.log)

The results.csv file has the following format:

  • filename: Name of the file the item in this row is from
  • file_index: Index of the item inside filename
  • For each step:
    • <step_name>: Output of the LLM for this step

About

Design and run multi-step classification and reasoning pipelines on large-scale datasets.

Resources

License

Stars

Watchers

Forks

Contributors