This repository contains a Docker Compose configuration for running Ollama with FastAPI wrapper and Caddy reverse proxy.
- Base image:
ollama/ollama:latest - Provides the core LLM functionality
- GPU support enabled
- Port: 11434
NVIDIA_VISIBLE_DEVICES: Controls GPU visibility (default: all)OLLAMA_CONCURRENT_REQUESTS: Number of concurrent requests (default: 1)OLLAMA_QUEUE_ENABLED: Queue system status (default: true)
- Custom-built service using
Dockerfile.wrapper - Provides API interface for Ollama
- Port: 5000
PYTHONUNBUFFERED: Set to 1 for unbuffered outputSESSION_API_KEY: Optional API key for session management
- Custom-built service using
Dockerfile.caddy - Serves as reverse proxy
- Port: 3334 (configurable)
PUBLIC_ACCESS_PORT: Port configuration (default: 3334)
-
Clone this repository:
git clone https://github.com/ClinicianFOCUS/local-llm-container.git cd local-llm-container -
Launch the services:
docker-compose up -d
After container deployment, you can launch models using either the CLI or API:
-
Connect to the Ollama container:
docker exec -it ollama-service bash -
Pull your desired model:
ollama pull gemma2:2b-instruct-q8_0 # or any other model -
Run the model:
ollama run gemma2:2b-instruct-q8_0
-
Pull a model via API:
curl -X POST http://localhost:3334/api/pull \ -H "Content-Type: application/json" \ -d '{"name": "gemma2:2b-instruct-q8_0"}' -
Generate with the model:
curl -X POST http://localhost:3334/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "gemma2:2b-instruct-q8_0", "prompt": "Your prompt here" }' -
Health Check:
curl http://localhost:3334/health
Use
-kflag with curl if using self signed certificates.
You can find available models at:
| Variable | Default | Description |
|---|---|---|
| NVIDIA_VISIBLE_DEVICES | all | GPU devices available to Ollama |
| OLLAMA_CONCURRENT_REQUESTS | 1 | Maximum concurrent requests |
| OLLAMA_QUEUE_ENABLED | true | Enable/disable request queue |
| SESSION_API_KEY | - | API key for FastAPI wrapper |
| PUBLIC_ACCESS_PORT | 3334 | External port for Caddy |
You can set these variables using the CLI:
Windows:
$env:SESSION_API_KEY="MY_API_KEY_TO_USE__FOR_AUTHENTICATION"Linux:
export SESSION_API_KEY MY_API_KEY_TO_USE__FOR_AUTHENTICATIONAccess the LLM API through the Caddy reverse proxy:
- API Endpoint:
https://localhost:3334/api/ - Docs:
https://github.com/ollama/ollama/blob/main/docs/api.md
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.