GrooveForge

Search by vibe. Generate by blueprint.

Every musician has a tune in mind.

What if you could search and create music by feel, generate songs by vibe, play a song to create music Shazam-style, fuse genres into entirely new sounds, and transform lyrics into fully composed songs?

Meet GrooveForge — THE ULTIMATE AI TOOLKIT FOR ORIGINAL MUSIC CREATION.

What is GrooveForge?

GrooveForge is a retrieval-augmented music creation system — THE ULTIMATE AI TOOLKIT FOR ORIGINAL MUSIC CREATION.

Instead of describing music in the abstract, you search by the actual structural properties that make music sound the way it does — key, tempo, mood, instrumentation, lyrical themes. GrooveForge gives you four powerful ways to create:

Vibe Graph — Click genre, mood, tempo, key, and theme nodes to compose a vibe
Sound Match — Play a song. GrooveForge extracts its sonic fingerprint and generates something completely original in the same feel — Shazam, but for creation
Text-to-Music — Describe what you want to create using natural language
Lyrics-to-Music — Transform written lyrics into a fully composed song

At its core, GrooveForge indexes millions of audio blueprints enriched with features that define a song's DNA: genre, mood, key, tempo, energy, danceability, acousticness, valence, instrumentalness, and vocal characteristics. By retrieving and analyzing the closest matches, it generates original compositions grounded in real musical structure — ensuring precision, originality, and creative control.

Every generated track comes with a visible reasoning trail — the exact blueprint cards that shaped it — so you can see why it sounds the way it does. No black boxes. No hallucinated characteristics.

How It Works

1. Describe your vibe — Select nodes in the graph, type a natural-language description, paste original lyrics, or just play a song you love and let GrooveForge extract the vibe.

2. Retrieve blueprints — Your input is searched across millions of indexed tracks to find the closest musical matches by feel, genre, mood, key, tempo, and instrumentation. The top 5–10 blueprints are surfaced and ranked.

3. Aggregate traits — The retrieved blueprints are collapsed into a generation profile: average BPM, dominant key and mode, most common genre and mood, merged instrumentation.

4. Generate your track — Gemini synthesizes a music prompt strictly from the retrieved blueprint traits and sends it to ElevenLabs Music API to produce an original composition. In Advanced mode, lyrics are placed section by section — never mixed into style guidance.

Every track includes a visible reasoning trail — the exact blueprint cards and aggregated profile that drove the generation.

Input Modes

Vibe Graph

Click genre, mood, tempo, key, mode, instrumentation, and theme nodes to compose a vibe. Every node selection tightens the search query. The system maps your picks to a hybrid retrieval query plus metadata filters, surfaces the closest blueprints, and generates.

Text-to-Music

Type anything: "moody synthwave, 110 BPM, instrumental" or "upbeat pop, female vocals, summer road trip". Your description is embedded and searched across millions of indexed tracks to find the closest blueprint matches.

Lyrics-to-Music

Paste original lyrics. Gemini analyzes emotional tone, themes, energy level, and rhythmic structure. The derived traits drive blueprint retrieval — your lyrics never contaminate the style guidance. In Advanced mode, lyrics are placed in ElevenLabs lines fields per section; style guidance comes from the blueprints only.

Sound Match

Just hit play on any song you love. Gemini extracts the sonic fingerprint: BPM, key, mode, mood, texture, instrumentation. Those traits drive blueprint retrieval across millions of tracks, and GrooveForge generates something completely original in the same vibe. The artist name and song title never reach ElevenLabs — only the derived feel does.

Generated History

Every track you generate is saved locally (localStorage). Replay any track, rename it, or download the MP3. History persists across sessions.

Architecture

See ARCHITECTURE.md for the full system diagram and endpoint reference.

Datasets

GrooveForge's blueprint index is built on four open datasets. Only structured metadata and derived features are used — no audio files are stored or processed.

Dataset	Size	What it contributes
Million Song Dataset (MSD)	~1M tracks	The backbone. Provides BPM, key, mode, loudness, artist familiarity, and release year for a million songs.
LP-MusicCaps-MSD	~513K tracks	MSD tracks enriched with human-written captions from the MusicCaps annotation project. It provides rich natural-language descriptions of mood, texture, instrumentation, and genre — the primary retrieval anchor for each blueprint's `text_description`.
Free Music Archive (FMA)	~106K tracks	Creative Commons licensed tracks with genre labels, Echonest audio features (valence, energy, danceability, instrumentalness, acousticness), and track-level metadata. Covers a wide range of independent and niche genres.
MusicCaps	~5.5K tracks	A high-quality, human-annotated evaluation set from Google DeepMind. Used to validate caption quality and tag vocabulary — but instrumental in shaping the genre/mood classification vocabulary.

Together these datasets cover mainstream, indie, electronic, classical, world music, and everything in between — giving retrieval broad coverage across moods, genres, keys, and tempos.

Data Pipeline

The blueprint index was built in three offline stages. All scripts live in backend/scripts/.

Stage 1 — Raw datasets → Blueprint Parquets

ingest_blueprints.py
  LP-MusicCaps-MSD (513,977 tracks)  → data/blueprints_lp_msd.parquet
  FMA              (106,574 tracks)  → data/blueprints_fma.parquet

For LP-MusicCaps-MSD:

Tags parsed and classified into genre / mood / themes via vocabulary sets
Vocal type inferred from tag strings (female vocal, male vocal, instrumental)
Energy derived from loudness: (loudness + 25) / 25, clamped to [0, 1]
text field assembled from caption_summary + caption_writing + tags + key/mode/BPM

For FMA:

Genre from genre_top; mood derived from echonest valence threshold (>0.6 → upbeat, <0.3 → melancholic)
Vocal type from instrumentalness threshold (>0.8 → instrumental)
text field assembled from title + genre + descriptors + BPM + mood

Stage 2 — Parquets → Turbopuffer

embed_blueprints.py
  blueprints_lp_msd.parquet  → Turbopuffer namespace lp_msd_minilm  (513,977 records)
  blueprints_fma.parquet     → Turbopuffer namespace fma_minilm     (106,574 records)

Embedding model: sentence-transformers/all-MiniLM-L6-v2 (384-dim, L2-normalized)
- During data pipeline (local): The model was run locally via the sentence-transformers Python package — no GPU required. all-MiniLM-L6-v2 is small enough to encode comfortably on CPU, producing ~1,000 vectors/sec and making it practical to embed all 620K+ blueprint records in a single offline run. Running it locally meant zero API cost for the bulk embedding pass and no rate-limit concerns.
- At inference (OpenRouter): For query embedding at request time, we switched to the same model served via the OpenRouter API. This avoids bundling the model weights in the Railway server container, keeps the deployment lightweight, and centralizes access with a single API key. The vectors are dimensionally identical (384-dim, L2-normalized), so ANN queries work seamlessly against the locally-built index.
- Batch encode: 256 rows per encode call; upsert 500 rows per Turbopuffer write call
Schema: text (full-text search enabled), plus filterable string attributes (source, genre, mood, vocal_type, key, mode, mode_key) and numeric attributes (bpm, year, energy, acousticness, valence, danceability, instrumentalness, artist_familiarity)
Checkpointed: progress saved to data/.embed_checkpoints/ so a killed run resumes from the last successful batch

Both namespaces are queried concurrently at runtime via asyncio.gather. Results are merged with Reciprocal Rank Fusion (RRF, k=60).

Generation Modes

Mode	Description
Simple (Prompt)	Fast iteration — one text prompt derived from aggregated blueprint traits sent to ElevenLabs
Advanced (Composition Plan)	Structured songs — section-level control (intro/verse/chorus/bridge/outro), lyric placement per section, local style guides

Both modes support Review Before Generate — a dry-run that synthesizes and shows you the exact prompt or composition plan before committing to an ElevenLabs call. Approve or cancel.

Composition plan structure:

positive_global_styles — genre, mood, tempo, key from aggregated blueprints
positive_local_styles — per-section style directions
lines — user lyrics only, placed per section (never mixed with style guidance)
negative_global_styles — traits to suppress

Tech Stack

Category	Technology
Frontend	React 18, TypeScript, Vite, TailwindCSS, Framer Motion, react-flow, Radix UI
Backend	FastAPI, Python 3.13+, `uv`, Pydantic, asyncio
AI & Music	ElevenLabs Music API (prompt + composition-plan), Google Gemini 2.5-flash
Retrieval	Turbopuffer — ANN + BM25 hybrid, metadata filters, RRF merge across 2 namespaces
Embeddings	`all-MiniLM-L6-v2` via OpenRouter API (384-dim, no local GPU needed)
Data Sources	LP-MusicCaps-MSD (513K), Free Music Archive (106K), MSD full (1M), MusicCaps (5.5K)
Deployment	Railway (backend, Hobby tier) + Vercel (frontend, SPA rewrite)

Screenshots

Blueprint Cards

Generated Track

Review Composition Plan

Generation Overlay

Running Locally

Prerequisites: Python 3.11+, Node 18+, uv

# Backend
cd backend
uv sync
cp .env.example .env   # fill in API keys
uv run uvicorn app.main:app --reload --port 8000

# Frontend (separate terminal)
cd frontend
npm install
npm run dev
# Opens at http://localhost:8080

Environment variables (backend/.env):

ELEVENLABS_API_KEY=...
TURBOPUFFER_API_KEY=...
OPENROUTER_API_KEY=...
GEMINI_API_KEY=...

Data pipeline (one-time setup — only needed to rebuild the Turbopuffer index):

cd backend
uv run python scripts/ingest_blueprints.py   # dataset → blueprint records
uv run python scripts/embed_blueprints.py    # embed + upsert into Turbopuffer

The Turbopuffer namespaces (lp_msd_minilm, fma_minilm) are already populated in production. You only need to run the data pipeline if you're rebuilding the index from scratch.

Copyright-Safe by Design

No copyrighted material ever reaches ElevenLabs — by design, not accident.

Metadata only — blueprints are structured features (BPM, key, genre, energy) and human-written captions. No audio is stored or processed.
Artist & title firewall — names and titles are stripped before anything reaches Gemini or ElevenLabs. Only derived traits flow into generation.
text_description excluded from LLM context — free-text fields stay retrieval-only (BM25); never passed to Gemini as they may embed real artist references.
Original output — ElevenLabs generates a brand-new composition. Blueprint retrieval shapes the style; it reproduces nothing.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
backend		backend
frontend		frontend
images		images
.gitignore		.gitignore
.python-version		.python-version
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GrooveForge

Table of Contents

What is GrooveForge?

How It Works

Input Modes

Vibe Graph

Text-to-Music

Lyrics-to-Music

Sound Match

Generated History

Architecture

Datasets

Data Pipeline

Stage 1 — Raw datasets → Blueprint Parquets

Stage 2 — Parquets → Turbopuffer

Generation Modes

Tech Stack

Screenshots

Running Locally

Copyright-Safe by Design

License

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GrooveForge

Table of Contents

What is GrooveForge?

How It Works

Input Modes

Vibe Graph

Text-to-Music

Lyrics-to-Music

Sound Match

Generated History

Architecture

Datasets

Data Pipeline

Stage 1 — Raw datasets → Blueprint Parquets

Stage 2 — Parquets → Turbopuffer

Generation Modes

Tech Stack

Screenshots

Running Locally

Copyright-Safe by Design

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages