Picochat is an honest SLM training factory: dataset import, tokenizer training, base pretraining, SFT, optional DPO, eval, serving, and release gates in one inspectable repo.
Product Page · Training Paths · Model Evidence · Pipeline Guide · 100M Runbook · 1B Runbook · Benchmarks · Registry · Release Gates · Honesty Checks · Deploy
Picochat is a from-scratch pipeline for building, checking, and releasing small language models without hiding data leakage, memorization, weak evals, or GPU-wasting launch mistakes.
It is inspired by Andrej Karpathy's nanochat, but the product goal is different. Picochat is not trying to claim frontier behavior from a tiny run. It is trying to make the whole small-model factory inspectable:
dataset -> tokenizer -> base pretraining -> chat SFT -> optional DPO -> eval -> release gate
Picochat now exposes two different training starts:
- Train from scratch:
picochat run tinybuilds a Picochat-native model from dataset pack through tokenizer, base pretraining, SFT, eval, and release gates. - Fine-tune an existing model:
picochat train hf-sftstarts from an existing Hugging Face causal LM such as SmolLM and trains on Picochat chat JSONL or multi-turnmessagesrows with assistant-only labels. This path is useful for hackathons and Qwen/SmolLM-style LoRA experiments.
See the Training Paths page before choosing a GPU workflow.
Picochat is a research-grade training harness and local workbench. The current
recommended public proof path is a one-GPU h100-100m run on a bounded
SmolLM-Corpus pack (fineweb-edu-dedup plus cosmopedia-v2) with explicit
sanity, preflight, dry-run, release-skills SFT/eval, and evidence-bundle
steps. The 1B-class h200-1b-ddp8 path remains prepared for a later
8xH100/H200 run.
Public model evidence is still pending. Picochat should not be judged by a claimed 1B result until a trained model, release card, model card, benchmark report, and contamination report are published together. The current evidence plan is tracked in the Model Evidence page.
| Artifact | Public status | Release rule |
|---|---|---|
| Local tiny demo | Ready | Smoke test only; not a model-quality claim |
| 100M H100/H200 public proof | Runbook ready | Publish only with eval, samples, gate report, and honesty report |
1B h200-1b-ddp8 run |
Prepared, not claimed | Publish only after preflight, DDP dry run, full eval, external benchmark, and release gate |
| Hugging Face model | Pending | Must include model card, release manifest, and honesty evidence |
What is ready:
- 1B-class decoder-only GPT stack: RoPE, RMSNorm, SwiGLU, GQA, QK norm, tied embeddings, parallel residual, scaled residual init, BF16, torch.compile, gradient checkpointing, and CUDA/DDP.
- ClimbMix import with corpus manifests, document-boundary checks, and sharded token loaders.
- Release-oriented SFT/eval packs for identity, refusal, choice, arithmetic, and spelling.
- Preflight checks that block unsafe or dishonest long runs before training.
- Post-run gates that block release when SFT fit, held-out fit, visible eval, external benchmarks, prompt echo, refusal behavior, or honesty checks fail.
- A local web dashboard with release readiness, loss curves, preflight output, Scale Up commands, paid-GPU confirmation, and DDP dry-run commands.
- Native PyTorch serving through
picochat serve, including local OpenAI-compatible/v1/completions,/v1/chat/completions, and/v1/modelsendpoints plusstream=trueSSE response framing for smoke integrations. - Optional post-SFT DPO through
picochat train dpofor curated preference pairs when teams have real chosen/rejected examples. - Dockerized local workbench and serving smoke paths for reproducible demos.
- HF-style export with model card, release manifest, Transformers
trust_remote_codeadapter, optional safetensors, and optional--push-to-hubpublishing. - Public benchmark protocol, CI, and contribution templates for external review.
What is not claimed:
- Picochat is not a production assistant.
- Picochat is not RAG.
- Picochat does not claim a useful 1B model before the 1B run and gates pass.
- Synthetic SFT is behavior-focused; it does not magically create knowledge the base model never learned.
Small-model projects often fail in predictable ways: eval prompts leak into SFT, validation text overlaps training data, tiny corpora are replayed hundreds of times, losses are shown without context, checkpoints corrupt on crash, and large GPU launches start before anyone has run a real preflight.
Picochat treats those as product problems, not afterthoughts.
The factory is built around four principles:
- Train visibly. Every stage writes artifacts that can be inspected and compared.
- Gate honestly. Preflight and release checks can block the run or block release.
- Separate practice from scoring. SFT rows are practice; eval rows are the scoreboard.
- Protect GPU spend. Scale-up commands include sanity checks, preflight, a short DDP dry run, and explicit paid-launch confirmation.
Picochat's paid-run path is still deliberately conservative: the 1B release recipe uses DDP, while experimental FSDP is exposed for base-training smoke tests before it graduates into the full factory.
On GPU hosts, picochat sanity preh100 --capacity-scale h200-1b-ddp8 can also run
a one-batch memory check for the exact scale before training starts.
Install locally:
git clone https://github.com/gowtham0992/picochat.git
cd picochat
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev,hf]"The installed command is picochat. A shorter pico alias is also provided,
but picochat avoids colliding with the system pico editor when the virtual
environment is not active.
Run the tiny demo:
picochat demoOpen the workbench:
picochat web --runs-dir runs --port 8765Then visit:
http://127.0.0.1:8765
Or start the workbench with Docker:
docker compose up --build picochat-webServe a trained checkpoint through a local OpenAI-compatible API:
export PICOCHAT_API_KEY="replace-me"
picochat serve \
--checkpoint runs/pico-demo/sft/checkpoint \
--tokenizer runs/pico-demo/tokenizer.json \
--host 127.0.0.1 \
--port 8000 \
--api-key-env PICOCHAT_API_KEYThen call:
curl http://127.0.0.1:8000/v1/chat/completions \
-H 'content-type: application/json' \
-H "authorization: Bearer $PICOCHAT_API_KEY" \
-d '{"model":"picochat","messages":[{"role":"user","content":"What is Picochat?"}],"max_tokens":80}'This native server is for local smoke tests and integration work. High-throughput production serving through vLLM, TGI, TensorRT-LLM, or llama.cpp remains future adapter work.
Run optional DPO after SFT when you have real preference pairs:
picochat data preference-starter \
--input runs/pico-demo/chat_benchmark.jsonl \
--out data/preferences.jsonl
picochat train dpo \
--input data/preferences.jsonl \
--tokenizer runs/pico-demo/tokenizer.json \
--checkpoint runs/pico-demo/sft/checkpoint \
--out-dir runs/pico-demo/dpo \
--learning-rate 0.000005 \
--beta 0.1Or wire DPO into the end-to-end factory so fit/eval/release gates score the post-DPO checkpoint:
picochat run tiny \
--out-dir runs/pico-demo \
--dataset-pack runs/pack/dataset_pack.json \
--dpo-input data/preferences.jsonl \
--dpo-steps 200Preference rows are JSONL with user or prompt, chosen, and rejected
fields. DPO improves preference alignment after SFT; it does not replace base
pretraining, SFT coverage, or the release gates.
The preference starter uses synthetic negatives for plumbing and smoke tests;
release alignment needs human or judge-reviewed preference pairs.
Build a model registry from completed runs:
picochat registry --runs-dir runs \
--out reports/model_registry.md \
--json-out reports/model_registry.jsonWrite a standard lm-eval-harness benchmark command for a HF export:
picochat eval lm-harness \
--model-path exports/picochat-run \
--tasks arc_easy,hellaswag \
--out-dir reports/picochat-run/lm_eval \
--device cuda:0 \
--dry-runThe 1B-class path is intentionally gated. The short version is:
setup -> sanity -> ClimbMix import -> release skills pack
-> preflight -> 100-step DDP dry run -> full run -> SFT/eval -> release gate
Read the runbook before spending GPU money:
The current h200-1b-ddp8 scale targets about 1.12B parameters and 22.4B
planned training tokens, roughly 20 tokens per parameter.
For the next paid run, use the 100M runbook before attempting 1B:
- 100M public proof runbook
- Dataset: bounded SmolLM-Corpus local pack.
- Scale:
h100-100m, about 107M parameters and 2.16B planned tokens. - Gate:
skill_release, with identity, refusal, choice, arithmetic, and spelling eval coverage.
This path is intentionally smaller, cheaper, and easier to publish honestly than the 1B run. It is the first model-quality proof target.
Picochat does not treat a completed run as a release. A run can finish training and still be blocked.
The release gate checks:
- preflight status
- token/parameter budget and corpus replay risk
- SFT fit and held-out SFT fit
- visible eval pass rate
- per-skill thresholds for identity, refusal, choice, math, and spelling
- external benchmark presence
- prompt echo and refusal behavior
- corpus/SFT/eval contamination signals
- data honesty report issues
The local dashboard reads real run artifacts. It does not display fake training progress.
Key stations:
- Dataset Bay: corpus preview, import, pack generation, tuning inspection, launch preflight, and CLI command preview.
- Tokenizer Lab: text-to-token-ID inspection.
- Training Dash: base/SFT loss and BPB curves with lower-is-better context.
- Eval Scoreboard: pass/fail rows, failure causes, prompt echo, and repair guidance.
- Release Readiness: the post-run gate in both beginner and research modes.
- Scale Up: remote setup, sanity, import, benchmark, preflight, DDP dry run, full train, bundle, and return commands.
- Product page
- Model evidence
- Architecture
- Pipeline guide
- 100M public proof runbook
- Benchmark protocol
- Model registry
- Release gates
- 8xH200 1B runbook
- Contamination and honesty
- Deployment
- Task mixture recipe
- Screenshot capture guide
To publish the product page with GitHub Pages, set the repository Pages source
to the docs/ folder on the develop or main branch.
Run tests:
pytest -qRun only web/dashboard checks:
pytest tests/test_web.py -qSee CONTRIBUTING.md for PR standards and release-evidence expectations.
Optional TensorBoard logging:
python -m pip install -e ".[monitor]"
picochat run tiny \
--out-dir runs/monitored-smoke \
--tensorboard-log-dir runs/monitored-smoke/tensorboard
tensorboard --logdir runs/monitored-smoke/tensorboardMIT. See LICENSE.



