Skip to content

Feat/own sft model#38

Merged
gitsad merged 21 commits into
mainfrom
feat/own-sft-model
Jun 30, 2026
Merged

Feat/own sft model#38
gitsad merged 21 commits into
mainfrom
feat/own-sft-model

Conversation

@gitsad

@gitsad gitsad commented Jun 30, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Adds first-class support for Mobile Reality's own self-hosted MDMA-IL model
(fine-tuned Gemma-4 + MDMA-IL LoRA, now public on
Hugging Face)
alongside the third-party models, plus the prompts, evals, and validator/parser
hardening that make its DSL→MDMA output reliable.

Highlights:

  • prompt-pack — new mobile-reality/mdma-il author + agent prompts and
    google/gemma author + fixer prompts, wired into the author registry. The
    mobile-reality/mdma-il author variant is DSL-aware (DSL input grammar +
    authoring rules + worked examples) and is the single source of truth (it was
    previously duplicated in the eval harness).
  • validator — detects HTML/XML-style component tags (<thinking …>,
    <think>, etc.) as invalid MDMA under a dedicated html-tags rule id, so a
    consumer can exclude it without silencing real YAML errors. (Our DSL model
    occasionally leaks these.)
  • parser — auto-quotes AI-generated YAML edge cases that crash parsing:
    values starting with a YAML indicator char (unit: %, range: > 40 mg/dL)
    and colon-space values (label: Example: Revenue).
  • evals — a self-contained evals/own-model/ suite (private package) that
    gates the hosted DSL model on the held-out scenarios, plus its assertions.
  • demo — autoplay preview driving the README comparison, and own-model
    provider wiring in the agent client.
  • docs — README speed comparison (GPT-5.5 vs. our hosted model) and
    Hugging Face links.

Note: the Gemma RL/dataset-generation suite is intentionally kept local
(gitignored), not part of this PR.

Closes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing behavior to change)
  • Refactor (no functional change)
  • Documentation
  • CI / tooling

Packages Affected

  • @mobile-reality/mdma-spec
  • @mobile-reality/mdma-parser
  • @mobile-reality/mdma-runtime
  • @mobile-reality/mdma-attachables-core
  • @mobile-reality/mdma-renderer-react
  • @mobile-reality/mdma-prompt-pack
  • Blueprint: ___________
  • Other: @mobile-reality/mdma-validator, @mobile-reality/mdma-cli, demo, evals (private)

Checklist

  • I have read the CONTRIBUTING guide.
  • My code follows the existing code style (pnpm format and pnpm lint pass).
  • I have added or updated tests that cover my changes.
  • All tests pass (pnpm test).
  • Type-checking passes (pnpm typecheck).
  • I have added a changeset (pnpm changeset) if this change affects published packages.
  • New or changed MDMA schemas are backwards-compatible (or marked as breaking).
  • Sensitive fields are marked with sensitive: true where appropriate.

How to Test

  1. pnpm install && pnpm build
  2. Prompt: confirm getAuthorPromptVariant('mobile-reality/mdma-il').prompt
    returns the DSL-aware prompt (DSL grammar + worked examples).
  3. Validator: a document containing <thinking>…</thinking> fails with an
    html-tags error; excluding html-tags suppresses only that error.
  4. Parser: values like unit: % and label: Example: Revenue parse to
    strings instead of throwing / nesting.
  5. Own-model evals (optional, needs a live endpoint): set OWN_MODEL_* in
    evals/.env, then pnpm --filter @mobile-reality/mdma-evals eval:own-model.
  6. Docs: open README.md and verify the speed-comparison section renders the
    two GIFs side by side and the Hugging Face links resolve.

Screenshots / Examples

See the README Speed comparison section (assets/gpt-5.5.gif vs.
assets/own-model.gif).

gitsad and others added 21 commits June 15, 2026 09:46
Add a scoped `eval:gemma` target that runs the MDMA author prompt against
Gemma 4 (via OpenRouter). Model selection is comment-toggleable in
promptfooconfig.gemma.yaml (26B-a4b active, 31B ready to swap in). Outputs
write to the scoped evals/gemma/ directory — kept out of the root
evals/results*.json gitignore so generated MDMA can be reused downstream.

Baseline (gemma-4-26b-a4b-it): 28/28 cases pass the validator suite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ation, not published)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gemma eval suite and dataset generator are kept local (gitignored) and
not published, so remove the eval:gemma* and dataset:* scripts that point at
gemma/ paths. own-model and base eval scripts are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the 5 root-level internal planning/serving docs (endpoint URLs, Modal auth
scheme, budgets, troubleshooting) and scrub references to them from evals configs,
prompts, and own-model README. Kept locally, not published.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or prompt

The mdma-il model reads an MDMA-IL DSL intent, so its system prompt must
describe the DSL grammar; the previous variant had none. Promote the eval
harness's DSL-aware authoring prompt (grammar + worked examples) to the
prompt-pack variant as the single source of truth, and repoint the
author/custom/conversation eval suites to import it via getAuthorPromptVariant.
Delete the now-duplicated evals/own-model/authoring-system-prompt.mjs and refresh
the stale 'thin prompt' wording in the README/configs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gitsad gitsad merged commit 645e1fc into main Jun 30, 2026
1 check passed
@gitsad gitsad deleted the feat/own-sft-model branch June 30, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants