Skip to content

Conversation

@forsyth2
Copy link
Collaborator

This PR should NOT be merged. It is to show a prototype developed for the January 2026 LLNL Datathon. See here for an earlier (unrelated) zppy "hackathon" project.

Goal: a user should be able to give a simulation data directory path and get back a zppy cfg that if not perfect, at least gives a really good starting point.

Motivating need: the zppy cfg is quite complex, with a very large number of parameters. It would therefore be useful to create custom starter cfg files for users to build on.

Architecture: This implementation consists of 3 layers, building on top of each other.

  1. A script to extract available data from a simulation output directory. That's simulation_output_reviewer.py.
  2. A script to generate a starting point cfg file based on the data found. That's zppy_config_generator.py.
  3. An agentic AI question-answering system that would allow users to customize that cfg (the "answer" would be a better cfg than the starting point). Examples: a user could request "diagnostics relating to this particular physical phenomena" or "diagnostics on a hindcast during these particular years"

Layer 3 is more exploratory and the core of the Datathon challenege. Layers 1-2 will likely be cleaned up and merged into zppy as a distinct PR at a later date.

I've gone with this approach because a lot of the zppy cfg can in fact be constructed in a deterministic manner, lending itself more to a script than an AI agent. However, the more natural-language oriented layer 3 is a better fit for an AI agent, and the agent would also have a more-informed starting point rather than needing to "learn" all the rules of our data structuring.

@forsyth2 forsyth2 self-assigned this Jan 21, 2026
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 21, 2026

Current status (end of datathon day 1): Layer 1 is complete. Layer 2 is in-progress, nearing completion.

Layer 2 action items:

  • Some subtasks are missing even though I know the data is available. It seems likely to be from a bug in the Dependency logic.
  • Find a way to infer the various data paths
  • Find a better way to determine/suggest year increments.

Layer 3 action items:

  • Add ollama to environment and begin experimenting with it to construct the question-answering system.

@forsyth2
Copy link
Collaborator Author

Layer 2 action items above implemented by second commit

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third commit is an initial implementation of an agentic AI workflow. run_agent.py was largely generated with Claude, with fixes added by me. I will now proceed with actually installing ollama and testing it out.

Notes from Claude on Ollama:

Usage Limits - Completely Unlimited! 🎉
Ollama is 100% free and open-source with NO limits:

✅ No token limits - Process as much as you want
✅ No API costs - Runs locally on your machine
✅ No internet required - After downloading models, works offline
✅ No rate limits - Run as many queries as you need
✅ Privacy - Your data never leaves your machine

Requirements:

Disk space: ~4.7GB for llama3.1:8b, ~40GB for llama3.1:70b
RAM: 8GB minimum for 8B model, 64GB for 70B model
GPU: Optional but highly recommended (10x+ faster with GPU)

Recommended for your use case:

Start with llama3.1:8b - fast, runs on most machines
If accuracy isn't perfect, try llama3.1:70b (needs more resources)
Or try llama3.2:3b if you have limited RAM

print("Error: ollama not found. Please install ollama first:", file=sys.stderr)
print(" https://ollama.ai", file=sys.stderr)
print(
" Official instructions: https://ollama.com/download. For Linux, that shows `curl -fsSL https://ollama.com/install.sh | sh`. After installation, verify that Ollama is available by running `ollama --version`",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following directions from @tomvothecoder in https://github.com/aims-group/llnl-datathon-2026/pull/1/files:

### 3. Option B: Local LLM via Ollama (Optional)

Ollama is used as a **local LLM service** for agentic workflows.

> **Important:** Ollama is **not managed by Conda**.
> It must be installed at the system level.
Install Ollama by following the official instructions:
[https://ollama.com/download](https://ollama.com/download)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants