Search by vibe. Generate by blueprint.
Every musician has a tune in mind.
What if you could search and create music by feel, generate songs by vibe, play a song to create music Shazam-style, fuse genres into entirely new sounds, and transform lyrics into fully composed songs?
Meet GrooveForge — THE ULTIMATE AI TOOLKIT FOR ORIGINAL MUSIC CREATION.
- What is GrooveForge?
- How It Works
- Input Modes
- Architecture
- Datasets
- Data Pipeline
- Generation Modes
- Tech Stack
- Screenshots
- Running Locally
- Copyright-Safe by Design
GrooveForge is a retrieval-augmented music creation system — THE ULTIMATE AI TOOLKIT FOR ORIGINAL MUSIC CREATION.
Instead of describing music in the abstract, you search by the actual structural properties that make music sound the way it does — key, tempo, mood, instrumentation, lyrical themes. GrooveForge gives you four powerful ways to create:
- Vibe Graph — Click genre, mood, tempo, key, and theme nodes to compose a vibe
- Sound Match — Play a song. GrooveForge extracts its sonic fingerprint and generates something completely original in the same feel — Shazam, but for creation
- Text-to-Music — Describe what you want to create using natural language
- Lyrics-to-Music — Transform written lyrics into a fully composed song
At its core, GrooveForge indexes millions of audio blueprints enriched with features that define a song's DNA: genre, mood, key, tempo, energy, danceability, acousticness, valence, instrumentalness, and vocal characteristics. By retrieving and analyzing the closest matches, it generates original compositions grounded in real musical structure — ensuring precision, originality, and creative control.
Every generated track comes with a visible reasoning trail — the exact blueprint cards that shaped it — so you can see why it sounds the way it does. No black boxes. No hallucinated characteristics.
1. Describe your vibe — Select nodes in the graph, type a natural-language description, paste original lyrics, or just play a song you love and let GrooveForge extract the vibe.
2. Retrieve blueprints — Your input is searched across millions of indexed tracks to find the closest musical matches by feel, genre, mood, key, tempo, and instrumentation. The top 5–10 blueprints are surfaced and ranked.
3. Aggregate traits — The retrieved blueprints are collapsed into a generation profile: average BPM, dominant key and mode, most common genre and mood, merged instrumentation.
4. Generate your track — Gemini synthesizes a music prompt strictly from the retrieved blueprint traits and sends it to ElevenLabs Music API to produce an original composition. In Advanced mode, lyrics are placed section by section — never mixed into style guidance.
Every track includes a visible reasoning trail — the exact blueprint cards and aggregated profile that drove the generation.
Click genre, mood, tempo, key, mode, instrumentation, and theme nodes to compose a vibe. Every node selection tightens the search query. The system maps your picks to a hybrid retrieval query plus metadata filters, surfaces the closest blueprints, and generates.
Type anything: "moody synthwave, 110 BPM, instrumental" or "upbeat pop, female vocals, summer road trip". Your description is embedded and searched across millions of indexed tracks to find the closest blueprint matches.
Paste original lyrics. Gemini analyzes emotional tone, themes, energy level, and rhythmic structure. The derived traits drive blueprint retrieval — your lyrics never contaminate the style guidance. In Advanced mode, lyrics are placed in ElevenLabs lines fields per section; style guidance comes from the blueprints only.
Just hit play on any song you love. Gemini extracts the sonic fingerprint: BPM, key, mode, mood, texture, instrumentation. Those traits drive blueprint retrieval across millions of tracks, and GrooveForge generates something completely original in the same vibe. The artist name and song title never reach ElevenLabs — only the derived feel does.
Every track you generate is saved locally (localStorage). Replay any track, rename it, or download the MP3. History persists across sessions.
See ARCHITECTURE.md for the full system diagram and endpoint reference.
GrooveForge's blueprint index is built on four open datasets. Only structured metadata and derived features are used — no audio files are stored or processed.
| Dataset | Size | What it contributes |
|---|---|---|
| Million Song Dataset (MSD) | ~1M tracks | The backbone. Provides BPM, key, mode, loudness, artist familiarity, and release year for a million songs. |
| LP-MusicCaps-MSD | ~513K tracks | MSD tracks enriched with human-written captions from the MusicCaps annotation project. It provides rich natural-language descriptions of mood, texture, instrumentation, and genre — the primary retrieval anchor for each blueprint's text_description. |
| Free Music Archive (FMA) | ~106K tracks | Creative Commons licensed tracks with genre labels, Echonest audio features (valence, energy, danceability, instrumentalness, acousticness), and track-level metadata. Covers a wide range of independent and niche genres. |
| MusicCaps | ~5.5K tracks | A high-quality, human-annotated evaluation set from Google DeepMind. Used to validate caption quality and tag vocabulary — but instrumental in shaping the genre/mood classification vocabulary. |
Together these datasets cover mainstream, indie, electronic, classical, world music, and everything in between — giving retrieval broad coverage across moods, genres, keys, and tempos.
The blueprint index was built in three offline stages. All scripts live in backend/scripts/.
ingest_blueprints.py
LP-MusicCaps-MSD (513,977 tracks) → data/blueprints_lp_msd.parquet
FMA (106,574 tracks) → data/blueprints_fma.parquet
For LP-MusicCaps-MSD:
- Tags parsed and classified into genre / mood / themes via vocabulary sets
- Vocal type inferred from tag strings (
female vocal,male vocal,instrumental) - Energy derived from loudness:
(loudness + 25) / 25, clamped to[0, 1] textfield assembled fromcaption_summary+caption_writing+ tags + key/mode/BPM
For FMA:
- Genre from
genre_top; mood derived from echonest valence threshold (>0.6 → upbeat, <0.3 → melancholic) - Vocal type from instrumentalness threshold (>0.8 → instrumental)
textfield assembled from title + genre + descriptors + BPM + mood
embed_blueprints.py
blueprints_lp_msd.parquet → Turbopuffer namespace lp_msd_minilm (513,977 records)
blueprints_fma.parquet → Turbopuffer namespace fma_minilm (106,574 records)
- Embedding model:
sentence-transformers/all-MiniLM-L6-v2(384-dim, L2-normalized)- During data pipeline (local): The model was run locally via the
sentence-transformersPython package — no GPU required.all-MiniLM-L6-v2is small enough to encode comfortably on CPU, producing ~1,000 vectors/sec and making it practical to embed all 620K+ blueprint records in a single offline run. Running it locally meant zero API cost for the bulk embedding pass and no rate-limit concerns. - At inference (OpenRouter): For query embedding at request time, we switched to the same model served via the OpenRouter API. This avoids bundling the model weights in the Railway server container, keeps the deployment lightweight, and centralizes access with a single API key. The vectors are dimensionally identical (384-dim, L2-normalized), so ANN queries work seamlessly against the locally-built index.
- Batch encode: 256 rows per encode call; upsert 500 rows per Turbopuffer write call
- During data pipeline (local): The model was run locally via the
- Schema:
text(full-text search enabled), plus filterable string attributes (source, genre, mood, vocal_type, key, mode, mode_key) and numeric attributes (bpm, year, energy, acousticness, valence, danceability, instrumentalness, artist_familiarity) - Checkpointed: progress saved to
data/.embed_checkpoints/so a killed run resumes from the last successful batch
Both namespaces are queried concurrently at runtime via asyncio.gather. Results are merged with Reciprocal Rank Fusion (RRF, k=60).
| Mode | Description |
|---|---|
| Simple (Prompt) | Fast iteration — one text prompt derived from aggregated blueprint traits sent to ElevenLabs |
| Advanced (Composition Plan) | Structured songs — section-level control (intro/verse/chorus/bridge/outro), lyric placement per section, local style guides |
Both modes support Review Before Generate — a dry-run that synthesizes and shows you the exact prompt or composition plan before committing to an ElevenLabs call. Approve or cancel.
Composition plan structure:
positive_global_styles— genre, mood, tempo, key from aggregated blueprintspositive_local_styles— per-section style directionslines— user lyrics only, placed per section (never mixed with style guidance)negative_global_styles— traits to suppress
| Category | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite, TailwindCSS, Framer Motion, react-flow, Radix UI |
| Backend | FastAPI, Python 3.13+, uv, Pydantic, asyncio |
| AI & Music | ElevenLabs Music API (prompt + composition-plan), Google Gemini 2.5-flash |
| Retrieval | Turbopuffer — ANN + BM25 hybrid, metadata filters, RRF merge across 2 namespaces |
| Embeddings | all-MiniLM-L6-v2 via OpenRouter API (384-dim, no local GPU needed) |
| Data Sources | LP-MusicCaps-MSD (513K), Free Music Archive (106K), MSD full (1M), MusicCaps (5.5K) |
| Deployment | Railway (backend, Hobby tier) + Vercel (frontend, SPA rewrite) |
Blueprint Cards
Generated Track
Review Composition Plan
Generation Overlay
Prerequisites: Python 3.11+, Node 18+, uv
# Backend
cd backend
uv sync
cp .env.example .env # fill in API keys
uv run uvicorn app.main:app --reload --port 8000
# Frontend (separate terminal)
cd frontend
npm install
npm run dev
# Opens at http://localhost:8080Environment variables (backend/.env):
ELEVENLABS_API_KEY=...
TURBOPUFFER_API_KEY=...
OPENROUTER_API_KEY=...
GEMINI_API_KEY=...Data pipeline (one-time setup — only needed to rebuild the Turbopuffer index):
cd backend
uv run python scripts/ingest_blueprints.py # dataset → blueprint records
uv run python scripts/embed_blueprints.py # embed + upsert into TurbopufferThe Turbopuffer namespaces (
lp_msd_minilm,fma_minilm) are already populated in production. You only need to run the data pipeline if you're rebuilding the index from scratch.
No copyrighted material ever reaches ElevenLabs — by design, not accident.
- Metadata only — blueprints are structured features (BPM, key, genre, energy) and human-written captions. No audio is stored or processed.
- Artist & title firewall — names and titles are stripped before anything reaches Gemini or ElevenLabs. Only derived traits flow into generation.
text_descriptionexcluded from LLM context — free-text fields stay retrieval-only (BM25); never passed to Gemini as they may embed real artist references.- Original output — ElevenLabs generates a brand-new composition. Blueprint retrieval shapes the style; it reproduces nothing.
This project is licensed under the MIT License.









