snipsnap

Aggressive LLM switching operator for resource-starved Kubernetes clusters.

snipsnap manages a single GPU, loading one model at a time. When a different model is requested via the OpenAI-compatible API, the current model is immediately terminated and the new one is spun up. Persistent volume caching avoids re-downloading model weights.

Architecture

Client (OpenAI SDK)
       |
       v
 snipsnap Proxy (:8000)     <-- OpenAI-compatible API
       |
       v
 Workspace Controller       <-- Detects model mismatch, kills old pod, creates new
       |
       v
 Inference Pod (Ollama/vLLM) <-- Mounts cache PVC, claims GPU
       |
       v
   GPU (RTX 4090)

Quickstart

Prerequisites

Kubernetes cluster with GPU support (NVIDIA device plugin)
kubectl configured
Helm 3+

Deploy with Helm

CRDs ship inside the chart at charts/snipsnap/crds/, so Helm installs them on the first apply:

helm install snipsnap charts/snipsnap --namespace snipsnap --create-namespace

For a local dev cluster, use the values-dev.yaml overrides (always-pull image, metrics on, sample model pre-seeded) via the convenience target:

make dev-deploy

Register models

The chart templates Model CRs from the models: array in your values file. Add entries inline:

models:
  - name: llama3
    url: "ollama://llama3"
    engine: OLlama
    cache:
      enabled: true
      storageSize: "20Gi"
    resources:
      limits:
        nvidia.com/gpu: "1"

Then helm upgrade snipsnap charts/snipsnap -f your-values.yaml to apply.

Use the API

# The proxy auto-switches models. First request to llama3 will load it:
curl http://snipsnap-api.snipsnap:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

# Requesting mistral-7b will kill llama3 and load mistral:
curl http://snipsnap-api.snipsnap:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello!"}]}'

# List available models:
curl http://snipsnap-api.snipsnap:8000/v1/models

Custom Resources

Model

Defines an LLM that can be loaded. Supports Ollama and vLLM engines.

apiVersion: snipsnap.xgeeks.com/v1
kind: Model
metadata:
  name: llama3
spec:
  url: "ollama://llama3"
  engine: OLlama
  cache:
    enabled: true
    storageSize: "20Gi"
  resources:
    limits:
      nvidia.com/gpu: "1"

Workspace

Tracks which model is currently active on the GPU.

apiVersion: snipsnap.xgeeks.com/v1
kind: Workspace
metadata:
  name: default
spec:
  activeModel: "llama3"

Development

# Generate CRDs and code
make generate manifests

# Run locally against a cluster
make run

# Run tests
make test

# Build container image
make docker-build IMG=snipsnap:dev

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.cursor		.cursor
.devcontainer		.devcontainer
.github/workflows		.github/workflows
api/v1		api/v1
charts/snipsnap		charts/snipsnap
cmd		cmd
hack		hack
internal		internal
test		test
.custom-gcl.yml		.custom-gcl.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
TODO.md		TODO.md
go.mod		go.mod
go.sum		go.sum
prompt.md		prompt.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snipsnap

Architecture

Quickstart

Prerequisites

Deploy with Helm

Register models

Use the API

Custom Resources

Model

Workspace

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

snipsnap

Architecture

Quickstart

Prerequisites

Deploy with Helm

Register models

Use the API

Custom Resources

Model

Workspace

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages