Feat/own sft model by gitsad · Pull Request #38 · MobileReality/mdma

gitsad · 2026-06-30T16:44:17Z

What does this PR do?

Adds first-class support for Mobile Reality's own self-hosted MDMA-IL model
(fine-tuned Gemma-4 + MDMA-IL LoRA, now public on
Hugging Face)
alongside the third-party models, plus the prompts, evals, and validator/parser
hardening that make its DSL→MDMA output reliable.

Highlights:

prompt-pack — new mobile-reality/mdma-il author + agent prompts and
google/gemma author + fixer prompts, wired into the author registry. The
mobile-reality/mdma-il author variant is DSL-aware (DSL input grammar +
authoring rules + worked examples) and is the single source of truth (it was
previously duplicated in the eval harness).
validator — detects HTML/XML-style component tags (<thinking …>,
<think>, etc.) as invalid MDMA under a dedicated html-tags rule id, so a
consumer can exclude it without silencing real YAML errors. (Our DSL model
occasionally leaks these.)
parser — auto-quotes AI-generated YAML edge cases that crash parsing:
values starting with a YAML indicator char (unit: %, range: > 40 mg/dL)
and colon-space values (label: Example: Revenue).
evals — a self-contained evals/own-model/ suite (private package) that
gates the hosted DSL model on the held-out scenarios, plus its assertions.
demo — autoplay preview driving the README comparison, and own-model
provider wiring in the agent client.
docs — README speed comparison (GPT-5.5 vs. our hosted model) and
Hugging Face links.

Note: the Gemma RL/dataset-generation suite is intentionally kept local
(gitignored), not part of this PR.

Closes #

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing behavior to change)
Refactor (no functional change)
Documentation
CI / tooling

Packages Affected

@mobile-reality/mdma-spec
@mobile-reality/mdma-parser
@mobile-reality/mdma-runtime
@mobile-reality/mdma-attachables-core
@mobile-reality/mdma-renderer-react
@mobile-reality/mdma-prompt-pack
Blueprint: ___________
Other: @mobile-reality/mdma-validator, @mobile-reality/mdma-cli, demo, evals (private)

Checklist

I have read the CONTRIBUTING guide.
My code follows the existing code style (pnpm format and pnpm lint pass).
I have added or updated tests that cover my changes.
All tests pass (pnpm test).
Type-checking passes (pnpm typecheck).
I have added a changeset (pnpm changeset) if this change affects published packages.
New or changed MDMA schemas are backwards-compatible (or marked as breaking).
Sensitive fields are marked with sensitive: true where appropriate.

How to Test

pnpm install && pnpm build
Prompt: confirm getAuthorPromptVariant('mobile-reality/mdma-il').prompt
returns the DSL-aware prompt (DSL grammar + worked examples).
Validator: a document containing <thinking>…</thinking> fails with an
html-tags error; excluding html-tags suppresses only that error.
Parser: values like unit: % and label: Example: Revenue parse to
strings instead of throwing / nesting.
Own-model evals (optional, needs a live endpoint): set OWN_MODEL_* in
evals/.env, then pnpm --filter @mobile-reality/mdma-evals eval:own-model.
Docs: open README.md and verify the speed-comparison section renders the
two GIFs side by side and the Hugging Face links resolve.

Screenshots / Examples

See the README Speed comparison section (assets/gpt-5.5.gif vs.
assets/own-model.gif).

Add a scoped `eval:gemma` target that runs the MDMA author prompt against Gemma 4 (via OpenRouter). Model selection is comment-toggleable in promptfooconfig.gemma.yaml (26B-a4b active, 31B ready to swap in). Outputs write to the scoped evals/gemma/ directory — kept out of the root evals/results*.json gitignore so generated MDMA can be reused downstream. Baseline (gemma-4-26b-a4b-it): 28/28 cases pass the validator suite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ation, not published) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The gemma eval suite and dataset generator are kept local (gitignored) and not published, so remove the eval:gemma* and dataset:* scripts that point at gemma/ paths. own-model and base eval scripts are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop the 5 root-level internal planning/serving docs (endpoint URLs, Modal auth scheme, budgets, troubleshooting) and scrub references to them from evals configs, prompts, and own-model README. Kept locally, not published. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…or prompt The mdma-il model reads an MDMA-IL DSL intent, so its system prompt must describe the DSL grammar; the previous variant had none. Promote the eval harness's DSL-aware authoring prompt (grammar + worked examples) to the prompt-pack variant as the single source of truth, and repoint the author/custom/conversation eval suites to import it via getAuthorPromptVariant. Delete the now-duplicated evals/own-model/authoring-system-prompt.mjs and refresh the stale 'thin prompt' wording in the README/configs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitsad and others added 21 commits June 15, 2026 09:46

feat: added gemma evals

8bf3b96

chore: working gemma and wip holdout

99db9e6

chore: regression handout almost ready

6a7ffc7

feat: prepare for running local model

0a9403e

feat: almost 100% in eval with custom

a1b3d99

feat: working on 26B MoE

56ae461

feat: working other oevals

a5d3c2f

feat: working connection with modal on gemma

fce393b

fix: fixed system prompt for increasing scores for own model

276c2d1

feat: passing all tests on temp 1

47a5cb2

feat: added html validtion for our own model

4d6b877

fix: improved overthinking

90d6803

fix: final fixes for prose

e2c9ead

chore: upadted agent settings

6192ab1

chore: updated README

e310267

chore: gitignore local gemma eval suite (kept local for dataset gener…

1349d25

…ation, not published) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore: add changeset for DSL-aware mdma-il prompt variant

4a04d6f

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

NtTestAlert approved these changes Jun 30, 2026

View reviewed changes

gitsad merged commit 645e1fc into main Jun 30, 2026
1 check passed

gitsad deleted the feat/own-sft-model branch June 30, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/own sft model#38

Feat/own sft model#38
gitsad merged 21 commits into
mainfrom
feat/own-sft-model

gitsad commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gitsad commented Jun 30, 2026

What does this PR do?

Type of Change

Packages Affected

Checklist

How to Test

Screenshots / Examples

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants