MediaKit AI

MediaKit AI is an extensible, modular MVP for image capabilities (generation, editing, etc.) built with FastAPI. It leverages a pluggable provider architecture to support multiple local or remote AI backends easily (supporting local Stable Diffusion 1.5 via Hugging Face diffusers and text-to-image endpoints via ollama).

Architecture

mediakit-ai/
├── app/
│   ├── main.py                    # FastAPI app entry point
│   ├── core/
│   │   ├── config.py              # Settings (env vars, provider configs)
│   │   └── exceptions.py          # Custom app exceptions
│   │
│   ├── providers/                 # 🔑 THE HEART — swap AI backends here
│   │   ├── base.py                # Abstract ImageProvider interface
│   │   ├── registry.py            # Provider registry (factory pattern)
│   │   ├── ollama/
│   │   │   ├── __init__.py
│   │   │   └── provider.py        # OllamaProvider(ImageProvider)
│   │   └── diffusers/
│   │       └── provider.py        # DiffusersProvider(ImageProvider) - SD 1.5
│   │
│   ├── controllers/
│   │   └── image_controller.py    # Route handlers (thin layer)
│   │
│   ├── services/
│   │   └── image_service.py       # Business logic, provider orchestration
│   │
│   ├── schemas/
│   │   └── image_schemas.py       # Pydantic request/response models
│   │
│   └── routers/
│       └── image_router.py        # FastAPI router wiring
│
├── tests/
│   └── test_image_generation.py
├── .env.example
├── requirements.txt
└── README.md

Setup & Running

Install Dependencies
```
pip install -r requirements.txt
```
Configure Environment Copy the example environment file and configure it:
```
cp .env.example .env
```

Run the Server Start the FastAPI development server:

# Run via Python directly
python app/main.py

# Or run via Uvicorn
uvicorn app.main:app --reload

Verify the Endpoint Send a request to generate an image (supports negative_prompt, num_inference_steps, and guidance_scale):

curl -X POST "http://localhost:8000/api/v1/images/generate" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "A beautiful sunset over the mountains", "negative_prompt": "blurry, low quality", "width": 512, "height": 512, "num_inference_steps": 25, "guidance_scale": 7.5}' \
     --output output.png

Providers Configuration

The active provider is configured via environment variables:

IMAGE_PROVIDER=diffusers: Runs Stable Diffusion 1.5 locally (utilizing PyTorch). Optimized for systems with RAM/GPU constraints. Uses device-auto-detection (cuda, mps, or cpu) and attention-slicing.
IMAGE_PROVIDER=ollama: Communicates with a running Ollama server.
Additional providers can be registered in app/providers/registry.py and activated by setting IMAGE_PROVIDER.

Testing

Run the test suite using pytest:

pytest

Environment Variables

Use .env.example as a template — it is safe to check into source control and contains placeholders for sensitive values. Do NOT commit a filled .env with real credentials or API tokens.

Important variables:

IMAGE_PROVIDER — ollama or diffusers (set to ollama to enable OCR and Ollama-based image generation).
OLLAMA_BASE_URL — HTTP URL of your local Ollama server (default: http://localhost:11434).
OLLAMA_MODEL — default image generation model (e.g. x/flux2-klein:4b).
OLLAMA_OCR_MODEL — OCR model name (e.g. deepseek-ocr:latest).
HF_API_TOKEN — (optional) Hugging Face API token if using HF features.

To create a local .env from the example:

cp .env.example .env
# edit .env and provide any required tokens or model names

OCR (deepseek-ocr)

The project exposes an OCR endpoint that uses the configured OLLAMA_OCR_MODEL via the Ollama HTTP API.

FastAPI route: POST /api/v1/images/ocr — accepts a multipart file upload (file).

Example curl request:

curl -X POST "http://localhost:8000/api/v1/images/ocr" \
   -F "file=@img.png" \
   -H "Accept: application/json"

If the server returns a 502 (ProviderError), check that your Ollama server is running and that the OCR model is loaded. Verify locally with:

# start/load the model (interactive)
ollama run deepseek-ocr:latest

# or list models (if supported by your Ollama version)
ollama list

For quick local debugging you can run the helper script test6.py in the repository root; it will call the Ollama HTTP API and print the extracted OCR text for img.png:

python test6.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediaKit AI

Architecture

Setup & Running

Providers Configuration

Testing

Environment Variables

OCR (deepseek-ocr)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MediaKit AI

Architecture

Setup & Running

Providers Configuration

Testing

Environment Variables

OCR (deepseek-ocr)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages