Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ A chain of 24 FFmpeg effects covers a mastering chain, compression, highpass and

<p align="center"><img src="docs/readme/dj.png" alt="Two-deck DJ console with jog wheels, the central mixer, and the FX rack" width="820"></p>

Two decks run from a pro layout with jog wheels, a central mixer, scrolling waveform overviews, and a track browser, loading from the library, a saved set, or an online import. The engine handles octave-aware beatmatch sync with a continuous lock, key-lock, a 3-band EQ, a single-knob filter, and channel trim with auto-gain. Performance controls cover four hotcues, beat loops, momentary loop rolls, slip mode, beat jumps, and quantize. The FX rack adds a flanger, an impulse-response reverb, and a resonant wah per deck, with a master limiter on the DJ bus. Live stems ride on per-stem faders, and cue output pre-listens a deck through a headphone device chosen with `setSinkId`. Automix sequences and crossfades the set on its own, a ten-pad sampler bank fires one-shots, and a Next staging lane queues upcoming tracks. Design Mode turns the console into a hand-arranged layout that persists and exports.
Two decks run from a pro layout with jog wheels, a central mixer, scrolling waveform overviews, and a track browser, loading from the library, a saved set, or an online import. The engine handles octave-aware beatmatch sync with a continuous lock, key-lock, a 3-band EQ, a single-knob filter, and channel trim with auto-gain. Performance controls cover four hotcues, beat loops, momentary loop rolls, slip mode, beat jumps, and quantize. The FX rack adds a flanger, an impulse-response reverb, and a resonant wah per deck, with a master limiter on the DJ bus. Live stems ride on per-stem faders, and cue output pre-listens a deck through a headphone device chosen with `setSinkId`. Automix sequences and crossfades the set on its own and can be seeded and started from the Library's SUGGEST playlist, a ten-pad sampler bank fires one-shots, and a Next staging lane queues upcoming tracks. Design Mode turns the console into a hand-arranged layout that persists and exports.

### VJ visual engine

Expand Down Expand Up @@ -172,7 +172,7 @@ Every track and the relationships between them render as an interactive force-di
<img src="docs/readme/catalogue.png" alt="Cross-provider Catalogue gallery with provider badges and inspector" width="410">
</p>

The library lives on the backend, with audio on disk, metadata in `data/library.db`, and access over `/api/library/*`. Every render saves automatically with its prompt, model, duration, steps, CFG, seed, MIME type, and timestamp. List and grid views, full-text search, a favorites filter, and sorting by newest, duration, or title organize the collection, and each row plays inline with download, delete, favorite, and send-to-editor actions plus a details panel. The Catalogue view adds a cross-provider gallery with provider badges, an inspector with on-demand spectrograms, and a lineage panel, and it runs Suno cover and mashup from any entry.
The library lives on the backend, with audio on disk, metadata in `data/library.db`, and access over `/api/library/*`. Every render saves automatically with its prompt, model, duration, steps, CFG, seed, MIME type, and timestamp. List and grid views, full-text search, a favorites filter, and sorting by newest, duration, title, or play count organize the collection, and each row plays inline with download, delete, favorite, and send-to-editor actions plus a details panel. A per-entry play count persists to the library database and surfaces as a badge and a sort. SUGGEST builds a continuous playlist from the analyzed library, ordered by Camelot-wheel harmony and a chosen BPM flow across a time budget, then plays it through the footer queue or sends it to the DJ tab as an automix set. The Catalogue view adds a cross-provider gallery with provider badges, an inspector with on-demand spectrograms, and a lineage panel, and it runs Suno cover and mashup from any entry.

### Bottom panel tools

Expand Down
5 changes: 3 additions & 2 deletions backend/modules/library/router.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ async def stream_audio(entry_id: str) -> Response:
resp.raise_for_status()
audio_bytes = resp.content
# Cache to disk so future requests skip CDN.
local_name = meta.get("audio_filename") or f"{entry_id}.mp3"
local_name = (meta or {}).get("audio_filename") or f"{entry_id}.mp3"
local_path = entry_dir / local_name
try:
local_path.write_bytes(audio_bytes)
Expand Down Expand Up @@ -176,7 +176,8 @@ def register_play(entry_id: str) -> dict[str, Any]:
if record is None:
raise HTTPException(404, f"Entry {entry_id!r} not found")
entry_dir = store._dir_for(entry_id) # noqa: SLF001
store._sync_record_to_db(record, _read_metadata(entry_dir) or {}) # noqa: SLF001
meta = _read_metadata(entry_dir) if entry_dir is not None else None
store._sync_record_to_db(record, meta or {}) # noqa: SLF001
new_count = store.db.increment_play_count(entry_id) or 1
return {"id": entry_id, "play_count": new_count}

Expand Down
60 changes: 52 additions & 8 deletions docs/USER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,21 @@

_by GANTASMO_

theDAW is a complete digital audio workstation built on Stable Audio 3, the state-of-the-art open audio model from Stability AI. The model turns a text prompt into high-fidelity 44.1 kHz stereo audio, and theDAW surrounds that core with a full production environment. A React front end and a FastAPI backend run together from one launcher, so the whole studio starts with a single double-click.
theDAW packs most of a music career into one program. It writes original audio from a text prompt, arranges it on a multitrack timeline, masters it through a deep effects chain, splits it into stems, performs it across two beatmatched DJ decks, runs a reactive visual show, transcribes it to sheet music and tablature, and files everything in a searchable library that records how each piece descended from the last. It trains custom models on a collection and answers questions about any of it from a built-in assistant. A separate DAW, audio generator, stem separator, DJ rig, VJ engine, notation editor, sample manager, and model trainer collapse into a single window.

Audio generation happens in the MAKE workspace. A prompt becomes finished audio through Stable Audio 3, and the same model menu also reaches Magenta RealTime 2 for streaming text-to-music and Suno for cloud generation. Chimera fusion can blend several clips into one downbeat-aligned piece, while init signals, inpainting, and LoRA adapters give finer control over the result. When the model selection moves between Stable Audio and the Magenta sidecar, the GPU swap runs on its own.
![The 3D lineage galaxy in the LEARN workspace](screenshots/learn-galaxy.png "full")

Editing and mastering come next. The EDIT workspace opens a piece in a waveform editor that supports region inpainting. Effects and loudness work happens in MIX, which hosts the Edit Tool Stack alongside Quick Master macros. For live use, the DJ workspace provides a two-deck engine with beatmatch sync, key-lock, live stems, an FX rack, hot cues, and hands-free automix. WebGL visuals in the VJ workspace respond to the audio and accept MIDI, microphone, and mobile input.
Generation runs on Stable Audio 3, the open generative audio model from Stability AI. A prompt becomes finished stereo, and the same model menu reaches Magenta RealTime 2 for streaming text-to-music and Suno for cloud renders. Chimera fusion folds several clips into one tempo-aligned track, while init signals, inpainting, and LoRA adapters steer a render toward a target.

Everything generated is kept. The Library stores each piece on disk together with its analysis, stems, and MIDI, and LEARN renders the relationships between pieces as a navigable 3D genealogy. A symbolic-music pipeline produces sheet music, tablature, and multi-instrument arrangements, and it can read a score back into a usable text prompt. LoRA training and autoencoder round-trips have a home in the TRAIN workspace. An in-app assistant answers questions from this guide, and a paste-a-URL importer pulls audio from YouTube, SoundCloud, and Bandcamp.
![The two-deck DJ performance console](screenshots/dj.png "full")

The in-app **Docs** button renders this guide as an interactive modal with a filterable table of contents, raw Markdown download, and print-to-PDF. Each workspace and the backend are documented in full below.
A track then moves through EDIT for waveform surgery and region inpainting, and MIX for the Edit Tool Stack and Quick Master macros. The DJ workspace runs two decks with beatmatch sync, key-lock, live stems, an FX rack, hot cues, and hands-free automix, and the VJ workspace drives WebGL visuals from the audio, a microphone, MIDI, or a phone on the same network.

![Mastering and effects in the MIX workspace](screenshots/mix.png "full")

Nothing is discarded. The Library keeps every piece on disk with its analysis, stems, and MIDI, and LEARN draws the lineage between pieces as a navigable 3D genealogy. A symbolic-music pipeline turns a track into sheet music, tablature, and multi-instrument arrangements, and reads a finished score back into a prompt. TRAIN fits LoRA adapters and runs autoencoder round-trips, an assistant answers from this guide, and a paste-a-URL importer pulls audio from YouTube, SoundCloud, and Bandcamp.

The **Docs** button in the top bar opens this guide as a modal with a filterable table of contents, a Markdown download, and a print-to-PDF in three styles. Every workspace and the full backend API follow below.

---

Expand Down Expand Up @@ -310,6 +316,8 @@ A sticky bar fixed at the bottom of the MAKE workspace submits the generation jo

---

![MAKE generation controls and parameters](screenshots/make-controls.png)

## 7. EDIT Tab

### Purpose
Expand Down Expand Up @@ -395,6 +403,8 @@ During the render, the COMMIT EDIT button shows an animated spinner and is disab

---

![The EDIT multitrack waveform editor](screenshots/edit.png)

## 8. MIX Tab

### Purpose
Expand Down Expand Up @@ -465,6 +475,8 @@ The last processing invocations are retained in the store. Any history item can

---

![The MIX effects and mastering workspace](screenshots/mix-overview.png)

## 9. DJ Tab

### Purpose
Expand Down Expand Up @@ -501,7 +513,7 @@ DJ MIDI-learn binds a hardware controller to deck, mixer, and hotcue actions. It

### 9.7 Automix, Sampler, and Side List

- **Automix** sequences a setlist hands-free, beatmatching each transition.
- **Automix** sequences a setlist hands-free, beatmatching each transition. The Library's **Suggest a Playlist** can populate this set and start it in one step through its **Send to DJ** action (see §13.9).
- **Sampler bank**: drag a clip onto a pad to load a one-shot, then trigger pads during a set.
- **Side List**: a play-next staging lane above the browser. Stage upcoming tracks, reorder them, and pull them onto a deck when ready.

Expand All @@ -515,6 +527,8 @@ A floating **Edit Layout** control turns on Design Mode. In Design Mode, panels

---

![The DJ center mixer, EQ, and crossfader](screenshots/dj-center-mixer.png)

## 10. VJ Tab

### Purpose
Expand Down Expand Up @@ -597,6 +611,8 @@ Long-running training jobs are tracked through `GET /api/jobs/{id}`, polled at 1

---

![The TRAIN workspace for LoRA adapters](screenshots/train.png)

## 12. LEARN Tab

### Purpose
Expand Down Expand Up @@ -631,6 +647,8 @@ theDAW carries several other rich visualizations, each documented in its own sec

---

![The 2D lineage family tree in LEARN](screenshots/learn-2d.png)

## 13. Library

### Purpose
Expand Down Expand Up @@ -675,13 +693,14 @@ Toggle between a dense **List** view (one row per entry) and a **Grid** view (ti

- **Search** filters across title, prompt, model, tags, and notes at the same time.
- **FAVS** toggle restricts the view to favorited entries.
- **Sort** by Newest (timestamp descending), Duration (longest first), or Title (alphabetical).
- **Sort** by Newest (timestamp descending), Duration (longest first), Title (alphabetical), or Plays (most-played first).

### 13.4 Per-entry Controls

| Control | Description |
|---|---|
| **Play / Pause** | Loads and plays the entry through `playerStore`. Pausing one entry while another is active stops playback globally. |
| **Play count** | A badge on each played entry shows how many times it has played. The first play after a load increments a per-entry tally persisted in the library database (`POST /api/library/entries/{id}/play`), so the count survives restarts and metadata edits. |
| **Favorite star** | Toggles the favorite flag; persisted to the backend immediately. |
| **Download** | Triggers a browser file download of the audio. |
| **Delete** | Removes the entry from the backend store and the in-memory store. |
Expand Down Expand Up @@ -733,7 +752,18 @@ Important endpoints:

A stats footer shows the total entry count, the favorites count, cumulative storage size, and cumulative playback duration.

### 13.9 Empty State
### 13.9 Suggest a Playlist

The **SUGGEST** button opens a playlist builder that sequences analyzed tracks into a continuous set. The criteria are a target length, an optional BPM range, a flow shape (Steady, Build up, Wind down, or Wave), a harmonic toggle, and an optional genre or text filter. The backend engine (`POST /api/library/suggest-playlist`) reads each track's analysis and orders the set by harmonic key on the Camelot wheel, the chosen BPM flow, and small nudges toward popular and stylistically varied picks, filling the time budget. Each result row shows its BPM, Camelot code, and the reason it was chosen.

Two actions run the result:

- **Play All** loads the sequence into the footer player and auto-advances track to track, restoring the loop preference when it finishes.
- **Send to DJ** loads the order as the active automix set, switches to the DJ tab, and starts the beatmatch-crossfade automix (see §9.7).

Suggestions are strongest when the library is analyzed. Unanalyzed tracks still fill the budget but cannot be harmonically sequenced; the analyzer and the analyze-on-add toggle cover this over time.

### 13.10 Empty State

Shown until the first generation. It contains a **Go generate something** button that switches the active workspace to MAKE.

Expand Down Expand Up @@ -795,6 +825,8 @@ These exports use the same PPQ timing constants as live playback (`PPQ = 480`, o

---

![The 16-step sequencer with five synthesized voices](screenshots/sequencer.png)

## 15. Piano Roll

### Purpose
Expand Down Expand Up @@ -839,6 +871,8 @@ Clips in the waveform editor whose `sourceKind` is `'piano-roll'` display an **E

---

![The piano roll with MIDI import and export](screenshots/piano.png)

## 16. Bottom Panel Tabs

The bottom panel is collapsible and vertically resizable (drag the grip handle above it), and a maximize toggle expands any tab to fill the window. Six tabs are available.
Expand Down Expand Up @@ -867,6 +901,8 @@ Text in the overlay uses `textShadow` for legibility against any visualization b

**Canvas scaling:** the canvas is sized to its container in physical pixels (device pixel ratio capped at 2×) through a `ResizeObserver`. The `style` dimensions are set in CSS pixels, so the canvas stays crisp on high-DPI displays.

![The real-time spectral analyzer](screenshots/visualizer.png)

### 16.2 Piano

Full piano roll interface embedded in the bottom panel. See [§15](#15-piano-roll).
Expand Down Expand Up @@ -913,6 +949,8 @@ A session-scoped file holding area for arbitrary audio files. Contents are clear

The SLIDE tab is the glass-capsule control surface that mirrors the VJ engine's control manifest as faders. Moving a SLIDE fader updates the matching control in the VJ, and moving a control inside the VJ updates SLIDE. A content toggle switches which control set is shown, a detach button pops SLIDE out into its own window for a second monitor, and the shared maximize toggle expands it to fill the panel.

![The SLIDE glass control surface](screenshots/slide.png)

### 16.7 Score

The Score tab renders a track's symbolic music — sheet music, guitar and bass tabs, and arrangements — from its MIDI artifacts, and exports to MusicXML, ABC, PDF, and SVG. It is documented in full in §33.
Expand Down Expand Up @@ -1729,6 +1767,8 @@ The result runs on Windows through WSL2 with NVIDIA, on native Linux with NVIDIA

---

![The Magenta RealTime 2 conditioning panel](screenshots/magenta.png)

## 28. Edit Tool Stack

Beyond the 24-effect MIX chain (§8), theDAW mounts the **Edit Tool Stack**, six backend module families under `/api/edit/*`. Each family provides a focused set of audio processors built on FFmpeg, NumPy, and librosa DSP. The browser GUIs come from `frontend/public/edit-modules/` and iframe into the MIX effect stage.
Expand Down Expand Up @@ -1761,6 +1801,8 @@ The **Catalogue** view (`CatalogueView`, lazy-loaded in the shell) is a cross-pr

---

![The cross-provider Catalogue browser](screenshots/catalogue.png)

## 30. YouTube Import

The `ytimport` module (`/api/ytimport`) imports audio from a URL into the Library.
Expand Down Expand Up @@ -1820,6 +1862,8 @@ theDAW turns audio into symbolic music and back: audio → MIDI → sheet music,

A track needs a MIDI first. Convert one from the Library (right-click → Convert to MIDI, §13.7); once a MIDI artifact exists, the Score buttons activate.

![Guitar tablature rendered from a track's MIDI in the Score panel](screenshots/score.png)

### 33.1 Notation artifacts

Every symbolic file a track produces is a notation artifact with a kind: `midi`, `musicxml`, `abc`, `alphatex` (tabs), `pdf`, or `svg`. The Score panel's left rail lists them; selecting one previews it (MusicXML as sheet music through OpenSheetMusicDisplay, alphaTex as tablature through alphaTab), and DOWNLOAD saves it. Artifacts are stored under `data/generations/<entry_id>/notation/` and tracked in the library database with lineage back to their source MIDI or score.
Expand Down
Loading
Loading