Skip to content

Feat/v1.2#2

Merged
john-lawniczak merged 78 commits into
mainfrom
feat/v1.2
Jun 12, 2026
Merged

Feat/v1.2#2
john-lawniczak merged 78 commits into
mainfrom
feat/v1.2

Conversation

@john-lawniczak

Copy link
Copy Markdown
Owner

Type

  • Add (new feature/rule/endpoint)
  • Fix (bug fix)
  • Change (behavior change)
  • Remove (deletion)
  • Refactor (no behavior change)
  • Docs
  • Test
  • Perf
  • Build (CI/packaging/deps)

Summary

How it was tested

Checklist

  • PR title uses a type tag (Add:, Fix:, ...).
  • pytest -q passes.
  • python tools/check_maintainability.py --strict passes.
  • cd ui && node --test passes.
  • New behavior and any fixed regression have tests.
  • No patient data; no verdict language; mechanics stay in deterministic rules.
  • Docs updated if behavior or provider behavior changed.

Click a tooth in the 3D preview to select it, then nudge its final in-plane
position (mesiodistal x + front-back y) in 0.2 mm steps. The authored target is
written as a normal source:"manual" stage delta, so the engine still computes
all movement and Generate Plan re-stages it via the existing authored path
(aligner count from the standard timeline projection).

Honesty constraints baked in: translation only (no rotation - scan frame is
rotation_renderable=false; no vertical z), gated on confirmed scan units (mm),
and framed as a geometric target, not a treatment goal or approval.

- ui/manual_edit.js (pure/DOM-free) + ui/manual_edit.test.js (13 cases)
- ui/viewer3d.js: proxies tagged with userData.tooth; click-raycast picking
  (click-vs-orbit threshold) + setSelectionHandler/Enabled/Tooth + highlight
- ui/render.js: renderManualEdit() panel + selection wiring
- ui/app.js: nudge/reset/deselect handlers; ui/state.js: manualEdit state
- ui/index.html + ui/styles.css: Manual target panel
- No backend/segmentation-API change: picking resolves the tooth from the
  rendered crown the segmentation API already provides

Tests: 38 UI, 292 pytest, maintainability --strict all green.
Introduce an optional Open3D mesh-processing extra and a hybrid arch segmenter that scores candidate tooth boundaries from arch position, crown-height valleys, curvature, and face-normal changes before producing graph-cut-style per-tooth STL proposals.

Wire the hybrid segmenter through the existing SegmentationModel seam and /api/segment response metadata while keeping the prior heuristic as fallback. Add tests for the hybrid diagnostics, backend metadata, and safer manual review behavior.

Improve the segmentation review UI by surfacing proposal method/backend details, flagging low-confidence rows, and rejecting duplicate corrected FDI tooth numbers during explicit apply.

Refresh README, HOW_TO, architecture, UI, license, and source-reference docs to reflect the optional Open3D path and the legal boundary around GPL external tools.
Make tooth-segmentation accuracy measurable. Until now the hybrid segmenter
shipped without any correctness metric - nothing checked whether a proposed cut
landed on the right boundary or carried the right FDI label, so changes could
not be proven to improve accuracy.

- orthoplan/validation/segmentation_truth.py: build a synthetic arch whose
  per-triangle tooth membership is known by construction (cosine crowns +
  valleys on a horseshoe), run the active segmenter, and score it. Metrics:
  region_purity (boundary quality, label-invariant), triangle_label_accuracy
  (boundary + correct FDI label), labels_recovered, mean_centroid_arc_error_deg.
- orthoplan/validation/segmentation_cases.py: two Measurement Truth Lab cases on
  the active load_local_segmenter() - segmentation-full-arch-accuracy (PASS gate:
  exact count, purity >= 0.78, label_acc >= 0.55) and segmentation-missing-tooth
  (records tooth_count_error from the fixed-canonical-count assumption; gates
  purity + labels_recovered). Synthetic, PHI-free, deterministic, CI-fast.
- Wire both into measurement_truth_cases(); surfaced via orthoplan measurement-lab.
- tests/test_segmentation_truth.py (7 cases).

Baseline on a clean synthetic full arch: purity 0.826, label_acc 0.629 - the
segmenter mislabels ~37% of triangles even on ideal input, and a missing tooth
yields tooth_count_error 1. The purity/label-accuracy gap is the labelling
cascade, which this harness now measures so the next task (data-driven tooth
count + missing-tooth handling) can prove a gain.

Tests: 300 pytest, maintainability --strict, 38 UI tests all green.
The segmenter assumed a full canonical 14-tooth arch and always made that many
cuts, so an arch with a tooth missing was over-split (one real tooth fragmented,
all labels cascaded). The accuracy harness pinned this as tooth_count_error 1.

Detect the count from the scan instead of assuming it:
- arch_profile.detect_cut_count: counts prominent embrasures (minima for the
  height profile, maxima for the hybrid cost signal) with min-separation dedup,
  capped at canonical; returns 0 (caller falls back to canonical) on a flat or
  unusable signal so low-relief scans never collapse.
- heuristic.resolve_tooth_count / teeth_from_profile: bounded count + FDI labels.
  Both segmenters count from the physical HEIGHT valleys (most reliable) and keep
  their own signal for boundary PLACEMENT. Shared arch_profile.arc_signal extracted.
- On a count != canonical: per-segment confidence x0.6 and a linted review
  advisory (auto.build_count_advisories) via /api/segment, because which tooth is
  absent cannot be known from crown geometry - FDI labels on a gap arch are a
  positional guess the user must review (follow-on: a 'mark missing tooth' signal).

Harness ratcheted: segmentation-missing-tooth now asserts tooth_count_error == 0
and region_purity >= 0.80. On the 13-tooth synthetic arch: 13/13 regions (err 0),
purity 0.83 -> 0.87. Full arch unchanged (14/14). triangle_label_accuracy drops on
gap arches (0.61 -> 0.19) - the recorded, honest cost of unsolved FDI-on-gaps;
region quality and count are the geometry-solvable win this delivers.

Tests: 303 pytest (+3), maintainability --strict, 41 UI tests all green.
…& Photos guide tab

Investigation: the sidebar 'Key Terms & Tooth Map' link (and the FDI tooth map
inside it) appeared dead for new users. Root cause is a mode/CSS conflict, not a
broken handler: the link lives in the sidebar (visible in both modes) and clicking
it sets activeStep=glossary and activates #panel-glossary, but
'body[data-mode="simple"] .panel { display: none }' keeps every technician panel
hidden in the default guided mode - so nothing appears.

Fix:
- CSS override (ID selectors beat the mode rule): when the active step is an info
  panel in guided mode, surface #panel-glossary / #panel-photos over the wizard
  and hide #guided.
- Back affordance so the panels are not dead ends: goToStep() remembers the prior
  step (state.returnStep) when an info step opens; a 'Back' button in each panel
  returns there. INFO_STEPS = [glossary, photos].

New tab - Imaging & Photos Guide (#panel-photos), mirroring the glossary and
reachable from the sidebar in both modes:
- Relative-value table (CBCT DICOM 10/10, STL+Periapical 7-8/10, STL+Panoramic
  7/10, STL+Photos 6/10, STL only 5/10, Panoramic only 5/10, Photos only 3/10)
  with what each input adds and a rough USD cost.
- Framed as how much each record helps THIS engine close data gaps (the planner
  uses crown-surface STL geometry only); educational, not medical advice, and
  noting X-ray/CBCT are ionizing radiation ordered by a licensed professional.

Files: ui/index.html, ui/state.js, ui/app.js, ui/styles.css, docs/UI_DESIGN.md.
Validated: 41 UI tests, JS syntax, HTML structure + unique ids; e2e selectors
untouched (additive), runs in CI.
Geometry cannot tell WHICH tooth is absent, so on a gap arch the data-driven
count gets the right number of regions but positional FDI labels (triangle label
accuracy ~0.19 on the synthetic gap arch). This adds the user signal that closes
that gap: the user marks the missing tooth, and regions are labelled by the
canonical order minus that tooth.

- auto.SegmentationModel.segment now accepts tooth_values; auto.tooth_values_for_arch
  builds the explicit labels from marked gaps (filtered to the scan's arch).
- segmentation_api parses payload.missing_teeth and threads it per-arch into the
  segmenter; the count-difference advisory still surfaces for review.
- UI: a 'Missing teeth (FDI, optional)' field in the segmentation panel;
  segment.parseMissingTeeth (pure, unit-tested) sends {missing_teeth} to /api/segment.
- Harness: new segmentation-missing-tooth-marked case proves the gain - marked
  label accuracy 0.60 vs unmarked 0.19 on the 13-tooth synthetic arch, purity held.

This turns the recorded triangle_label_accuracy drop from the data-driven-count
change back into a gain, and pairs with the per-tooth manual correction already
in the review UI.

Tests: 306 pytest (+5), maintainability --strict, 40 UI tests all green.
…tabs

The single 'Key Terms & Tooth Map' panel held both the FDI tooth map and the A-Z
glossary. Split into two independent sidebar tabs:
- Tooth Map (#panel-toothmap): FDI numbering explainer, chart, quadrant map.
- Glossary (#panel-glossary): searchable key-terms list.

Both (plus Imaging & Photos Guide) are reachable from the sidebar in either mode,
surface over the guided wizard when active, and have their own Back button.
INFO_STEPS += toothmap; the guided-mode CSS override covers #panel-toothmap.

Validated: 40 UI tests, HTML structure + unique ids; e2e selectors untouched.
Surface what the engine already computes and close the loop with the
mark-the-missing-tooth signal:

- Per-tooth confidence in the review rows is now tier-coloured (low <45 red /
  mid <65 amber / high green) with a 'Review' tag on low, so low-confidence and
  count-mismatch teeth stand out instead of reading as a uniform bar.
- When a proposed arch is not a full arch (14), an amber banner prompts the
  reviewer to enter the missing FDI number in 'Missing teeth' and use a new
  'Re-anchor labels' button, which re-runs the proposal with the marked gap (same
  code path as Propose) so the FDI labels line up around the gap - no re-upload.
- Pure logic (confidenceTier, countNoteMarkup, FULL_ARCH_TEETH) lives in DOM-free
  core.js and is unit-tested; render.js imports it; app.js wires reanchorSegment.

Tests: 43 UI tests (+3), maintainability --strict green.
The auto-segmentation review (proposal, per-tooth corrections, marked missing
teeth, applied fragment) was lost on every page reload. It is working state, not
plan data, so it is now persisted browser-local in localStorage keyed by plan id
- without polluting the TreatmentPlan model (which the version snapshot validates
strictly).

- storage.js: saveSegmentationReview / restoreSegmentationReview /
  clearSegmentationReview (localStorage, plan-id keyed, failures non-fatal).
- app.js: persist on propose/re-anchor, apply, per-tooth correction, include
  toggle, and missing-teeth input; restoreStoredSegmentation() on load.
- restorePlan now rehydrates state.segmentation.applied from the restored
  snapshot's mesh_assets/tooth_meshes (previously dropped on restore) and reloads
  the plan's review draft; the snapshot's applied meshes win for that version.

Tests: 45 UI tests (+2 storage round-trip and no-plan-id/no-storage guards).
maintainability --strict green (no Python changed).
Model the real open-extraction-gap case in the segmentation harness, then fix the
bug it found.

- build_synthetic_arch(gaps=...) fills a tooth's sector with a flat low gum
  surface (no crown), leaving a true one-tooth-wide hole; gum triangles carry no
  ground-truth tooth and the arch's tooth_values are the crowns actually present.
- The harness immediately surfaced a real bug: the data-driven counter, which
  counted the VALLEYS between crowns, over-counted an open gap - the wide gum
  floor's two shoulders each read as a cut, so a 13-crown arch was counted as 14.
- Fix: resolve_tooth_count now counts crown PEAKS in the height profile instead of
  valleys. A gum hole has no peak, so it can never read as an extra tooth. Correct
  across every scenario - full 14/14, congenitally-absent 13/13, open extraction
  gap 13/13, two adjacent extractions 12/12 (tooth_count_error 0 throughout).
- New lab case segmentation-open-gap gates the count and purity on the gum-hole
  arch; resolve_tooth_count signature simplified (always crown peaks).

This answers the open question before trusting the counter on real scans: a true
gap is now handled, and the mark-the-missing-tooth signal still recovers labels.

Tests: 310 pytest (+4), maintainability --strict, 45 UI tests green.
Every segmentation test so far ran on synthetic arches built to present crowns as
clean height peaks. This runs the SAME active segmenter on the real canonical
sample scans (~990k / ~860k vertices) to answer: does the synthetic-tuned
algorithm survive real geometry?

It is a loose smoke check, not an accuracy gate (the scans are unlabelled): no
crash, finishes well under a 30s budget (~0.5s actual), a plausible crown count
within the algorithm's own [floor, canonical] bounds, and valid unique in-arch
labels and confidences. It records the observed counts so the gap is tracked.

Finding it surfaced (documented in the test docstring): the mandibular scan counts
14/14 crowns correctly, but the maxillary scan counts only 7/14 - the upper
occlusal height profile (curve of Spee/Wilson, palate, shorter anteriors) presents
only 7 peaks prominent enough to clear the relative threshold. The lower-arch case
is gated (>= 12) to catch real-data regressions while the upper undercount stays
open as the next task.

Tests: 313 pytest (+3), maintainability --strict green.
The sim-to-real diagnostic showed the crown-peak counter, tuned on synthetic
arches, undercounting the real upper scan 7/14. Cause: at the coarse profile
resolution the flat maxillary occlusal plane (curve of Spee/Wilson, palate)
merges adjacent crowns into a single height peak, so fewer than half register.

Fix: count crowns from a FINER, dedicated height profile (16 buckets/tooth,
prominence ratio 0.18, half-tooth minimum peak separation), decoupled from the
coarser profile the segmenters use to PLACE boundaries. Counting and placement no
longer share a resolution, so merged real crowns resolve without disturbing
boundary placement or per-tooth purity.

- heuristic.teeth_from_profile -> teeth_from_signal(arch, positions, heights):
  builds its own count profile; resolve_tooth_count scales peak separation to the
  profile resolution. hybrid passes positions/heights through.
- Result: maxillary 7 -> 12, mandibular 14 (unchanged). 12/14 is the realistic
  ceiling for 1-D height counting (two upper crowns genuinely merge); the rest is
  a positional guess the user closes via mark-the-gap / re-anchor.
- No regression: synthetic cases unchanged (full 14, missing 13, open-gap 13,
  two-gap 12), flat-fixture fallback holds, mandibular real count held. The
  real-scan diagnostic now gates floors (maxillary >= 10, mandibular >= 12).

Tested against the real sample scans AND the synthetic harness. Tests: 314 pytest
(+1 real-scan regression guard), maintainability --strict green.
… tooth

A proposed count below a full arch is ambiguous: a tooth may be absent, OR two
crowns may have merged into one region (common on the flat upper occlusal plane -
the maxillary 12/14 case from the real-scan fix). The banner previously always
told the reviewer to 'enter the missing tooth', which is misleading for a merge.

- core.countNoteMarkup now takes markedGapCount. With no gap marked it states the
  ambiguity honestly ('Some crowns may have merged into one region, or a tooth may
  be absent ... otherwise review the tooth numbers'). Once the reviewer has marked
  gap(s) it becomes confirmatory ('Proposed N for your M marked gaps - review,
  then apply') and no longer re-prompts.
- render.js passes the marked count (parseMissingTeeth of the missing-teeth field).

Tests: 47 UI tests (+2 for the ambiguous and confirmatory branches).
…draw; Rx guide & prompts

3D viewer / guided wizard (ui/):
- Show the single 3D viewer in the guided "Teeth & time" and "Details" steps,
  not just "3D preview". The viewer is relocated into the active step's host
  (plan/details/preview) by guided.js; in teeth/details the technician toolbar,
  manual-target panel, and caveats are hidden so it reads as a focused aid.
- Visual tooth selection: clicking a tooth in those steps toggles "hold still"
  (folds into fixed_teeth), the visual equivalent of the checkbox list. Held
  teeth render in a blue-grey HELD material. Picking is enabled there without
  requiring confirmed mm units. The Details step forces overlay at the last
  stage so the "Preview movement scale" slider visibly drives on-screen motion.
- Anchor the per-tooth movement proxies onto the uploaded scan instead of
  floating them in schematic arch space below it (the "anchors" in the sample).
  On scan load, computeScanAnchors() fits the schematic arch into each arch's
  world bounding box (x/z) and raycasts the scan surface for the occlusal height,
  caching a per-tooth anchor point + fit scale; update() places and scales the
  proxies on those anchors so movement reads on the scan's own crowns. With no
  scan, the schematic fallback is unchanged. Movement stays simulated/labeled.
  NOTE: WebGL can't render in the sandbox; needs an in-browser visual check.

Tooth-numbering chart (ui/tooth-chart.svg, docs/images/fdi-tooth-map.svg):
- Replaced the flat box chart with a generated occlusal "into the mouth" view:
  pink gum + palate/tongue pads, mucosa depth, soft shadows, glossy enamel with
  cusp detail, anatomically proportional crown sizes, Universal numbers in
  circles + small FDI numbers, quadrant labels. Generated by new committed
  tools/gen_tooth_map.py + tools/tooth_map_draw.py.
- Tooth Map panel + GLOSSARY: define both FDI and Universal and explain simply
  why two numbering systems exist.

Imaging & Photos guide (ui/index.html):
- Split the "starting from scratch" advice into two bullets and added a
  "What is the Rx file?" card (scanner-export prescription is context only, not
  part of the STL, carries no geometry).

Prompts:
- New prompts/ directory: README + transverse-arch-width-sanity-check.md
  (example iTero/OrthoCAD independent arch-width check; educational, not a
  diagnosis).

Verified: node --check (all changed JS), node --test (47), pytest (314 passed,
1 e2e skipped — Playwright not installed), check_maintainability.py --strict.
…swapped

On a two-arch scan, computeScanAnchors() placed each arch's text label on the
occlusal (bite) side - upper label just under the upper arch, lower label just
over the lower arch - so both landed in the bite gap and read as swapped. Place
them on the outside instead: the upper label above the upper arch and the lower
label below the lower arch, clear of the crowns.
…cc 0.63->0.93)

Root cause: both segmenters placed inter-tooth cuts from rigid equal-spacing
nominals, snapping only within +/-half a tooth and clamping a colliding cut to
previous+1. When two nominals wanted the same embrasure that produced a
one-triangle SLIVER region, which shifted every downstream FDI label by one
position - capping the gated full-arch triangle_label_accuracy near 0.63 despite
clean geometry.

Fix: a shared place_cuts() (orthoplan/segmentation/arch_profile.py) selects the
most PROMINENT valleys (or score peaks, for the hybrid segmenter) subject to a
minimum separation, so cuts land on the true embrasures of an anatomically
uneven arch and two cuts can never collapse onto one. Equal spacing remains only
as a fallback to guarantee the required number of distinct boundaries. Rewired
heuristic.find_boundaries and hybrid._find_graph_cut_boundaries onto it.

Measured on the synthetic accuracy harness:
- full arch:   label 0.63 -> 0.93, region purity 0.83 -> 0.93
- open gap:    label 0.19 -> 0.87
- realistic:   label/purity 0.93 (new gated case)
Raised the full-arch regression floors (purity 0.78 -> 0.88, label 0.55 -> 0.85).

Accuracy harness made realistic so future segmenter work can prove a real-scan
gain: synthetic_arch.build_synthetic_arch now supports uneven crown widths
(realistic_widths: molars wide, incisors narrow), flat molar occlusal plateaus
(occlusal_flat, which merge into one peak and tempt under-counting), and
deterministic per-triangle height noise. Added the gated
segmentation-realistic-arch-accuracy case (uneven + flat + noise, floors 0.85)
plus tests for it and for sliver-free regions on an uneven arch.

Refactor: arch CONSTRUCTION moved to orthoplan/validation/synthetic_arch.py;
segmentation_truth.py keeps scoring and re-exports the construction API (keeps
both files within the maintainability line/function caps).

Note: the real OrthoCAD upper shell still detects 12 of ~14 crowns - that is the
COUNT-detection signal (resolve_tooth_count), a separate lever from boundary
placement and the next accuracy target; the realistic harness is the measurable
floor a learned backend would have to beat.

Verified: pytest 316 passed / 1 skipped, check_maintainability.py --strict,
node --test 47.
…ak separation

resolve_tooth_count counted crown peaks with a minimum separation of half an
AVERAGE tooth. On a real arch the narrow anterior teeth cluster closer in
arc-position (polar angle about the arch centroid) than that, so two real incisor
peaks were merged and the upper arch under-counted at 12/14. The fine count
profile already resolved all 14 peaks - the separation, not the signal, was the
limiter (so 12/14 was not the merged-peak ceiling the docstring assumed).

Tighten _COUNT_SEPARATION_FRACTION 0.5 -> 0.35 (a third of a tooth). Swept the
value against both real shells and every synthetic case: real upper recovers
12 -> 14, lower stays 14, and no synthetic arch over-counts even at 0.3 height
noise (the prominence threshold still rejects noise bumps). The bundled canonical
scans now recover 14/14 on both arches.

Lock-in + docs:
- Raised the real-scan crown-count floors (test_segmentation_real_scan) from
  maxillary 10 / mandibular 12 to 14 / 14, and corrected the module docstring
  (the under-count was a separation artefact, not a 1-D height ceiling).
- Confirmed on a real OrthoCAD export (not in the repo): both arches 14/14 with
  correct FDI labels.

Verified: pytest 316 passed / 1 skipped, check_maintainability.py --strict.
… in 3D

Once the user applies a segmentation, the viewer now shows the REAL per-tooth
crowns moving on the scan instead of synthetic proxies on a schematic arch -
closing the "anchors move, not the teeth" gap for segmented scans.

viewer3d.js:
- loadToothFragments() fetches each per-tooth STL fragment (which carries the
  original scan-space triangles), orients it exactly like the shell
  (orientScanGeometry) and deliberately does NOT center it, so the fragments
  reconstruct the arch sitting on the real crowns. Cached by FDI value; cleared
  when the scan changes.
- update() gains a fragment mode (active when fragments are loaded and a scan is
  present): the planned layer draws the real crowns translated by
  worldDeltaOriented() - the orientScanGeometry-consistent movement map (scan
  x,y,z -> world x,z,-y) - while the whole-arch shell remains the static baseline
  (shown in current/overlay, hidden in planned). Crowns are pickable, honour the
  held/selected materials, draw movement lines and tooth-number labels. Per-tooth
  ROTATION is deferred (translation is the unambiguous, high-value motion; correct
  rotation needs a trusted oriented per-tooth frame).

render.js:
- Routes render_meshes whose source is "model-generated" (applied segmentation),
  when a scan is loaded, to loadToothFragments; demo/class meshes keep the
  existing centered + schematic-arch path.

Verified the server data path end to end in Python (segment the bundled canonical
upper scan -> apply the fragment -> evaluate): 14 tooth_meshes, every render_mesh
source "model-generated", 14 tooth_frames. That check caught the provenance value
being the hyphenated "model-generated" (not "model_generated"), which the viewer
filter now matches - otherwise fragment mode would have silently never activated.

The 3D result itself is not renderable in this environment; it needs an in-browser
check (crowns overlay the scan and move sensibly, especially the front-back
direction). The sample test case is unchanged (no applied segmentation -> proxies).

Verified: node --test 47, pytest 316 passed / 1 skipped, maintainability --strict.
… pegs)

For a whole-arch scan that has NOT been segmented there are no per-tooth crowns to
move, so the viewer previously floated synthetic peg crowns anchored on the scan -
which read as "the anchors move, not the teeth". Replace that with an honest
indicator layer: a small teal marker dot on each tooth plus a blue arrow showing
where that tooth is planned to move (length scaled by the preview slider). No fake
crowns; the scan itself stays the teeth.

viewer3d.js:
- New arrowMode (a scan is loaded with anchors but no segmented fragments). Each
  pose draws a marker at the tooth's on-scan anchor and, when planned and the tooth
  is not held still, an arrow via addMovementArrow() (shaft line + shared cone
  head) along worldDeltaOriented() - the same scan-consistent movement map as the
  segmented crowns. Markers carry userData.tooth and reuse the HELD/SELECTED
  materials, so guided click-to-hold and the held tint still work; below a small
  displacement only the marker shows.
- Precedence: fragmentMode (segmented -> real crowns) > arrowMode (scan, no seg ->
  markers+arrows) > schematic proxies (no scan at all, e.g. the educational demo).

This also upgrades the sample test case (canonical scans, no applied segmentation)
from the floating pegs to markers + arrows.

The 3D result is not renderable in this environment; needs an in-browser check of
marker placement and arrow direction (shared worldDeltaOriented sign).

Verified: node --test 47, pytest 316 passed / 1 skipped, maintainability --strict.
…ided UI

In-browser confirmed the new 3D indicators work (teal markers on teeth,
click-to-hold, movement arrows). Make their meaning explicit so users understand
the view:
- Plan (Teeth & time) step: clearer copy - "hold a tooth still" means the plan
  leaves it where it is; untick it in the list or click its dot in the 3D view,
  click again to release.
- New .viewer-legend key under the plan viewer: teal dot = a tooth you can click
  to hold still, blue-grey dot = held still (won't move), blue arrow = which way
  that tooth is planned to move (simulated; the Details slider scales it). Swatch
  colours match the viewer materials.
- Details step: note that the scale slider grows/shrinks the arrows and that
  clicking a dot still holds a tooth.

HTML/CSS only. Verified: node --test 47, maintainability --strict.
…ar of crowns

Cosmetic pass after the in-browser check (TEST.mov confirmed the Details slider
scales the movement arrows correctly).

- Markers were anchored to the incisal/occlusal edge, which faces the bite, so in
  an anterior view both arches' dots bunched into the central gap rather than
  reading as one-per-tooth. Re-anchor each marker onto the tooth's BUCCAL face at
  mid-crown: push outward from the arch centre (scanHx*0.10) and lift toward the
  crown body (span*0.22 from the occlusal edge). Dots now sit on the visible faces
  and the upper/lower rows separate; arrows originate from the face.
- Arch labels: scale the offsets to the arch height (span*0.18 above the upper /
  below the lower, +scanHz*0.35 toward the camera) so they clear the crowns
  instead of sitting among them.

Only arrow-mode marker/arrow positions change; fragment mode (real crowns) and the
no-scan schematic proxies are untouched.

Verified: node --test 47, maintainability --strict. The 3D placement itself needs
an in-browser confirm.
…audit)

Post-implementation audit of the 3D segmentation/arrow work. Three correctness
fixes + test coverage for the new accuracy primitive.

1. loadToothFragments silent failure / unhandled rejection (viewer3d.js)
   A network throw in fetch() rejected the Promise.all, and the caller
   (render.js) has no .catch, so a transient fetch error became an unhandled
   rejection and aborted the whole fragment load. Wrap each fragment fetch in
   try/catch: failures are swallowed per item (the shell still covers that tooth)
   and a partial load renders the crowns that did arrive.

2. Stale-segmentation misalignment (app.js setUploadedFiles)
   Uploading a new scan did not invalidate a previously-applied segmentation.
   Its per-tooth meshes are in the OLD scan's coordinates, so they would be
   loaded and rendered misaligned over the new scan. Reset the segmentation
   (proposal/edits/applied) on a new upload, preserving the user's missingTeeth
   input.

3. Geometry leak on dispose() (viewer3d.js)
   dispose() freed line geometries and label sprites but not the per-viewer
   fragment crowns or the uploaded-scan meshes. Dispose both (the shared
   synthetic/class caches are module-level and intentionally retained).

Tests:
- Direct unit tests for place_cuts (the shared adaptive cut placer): picks the
  deepest valleys at uneven spacing, enforces min-separation (the sliver /
  label-shift regression), selects maxima in peaks mode (hybrid path), and falls
  back to distinct sorted interior cuts on a flat signal. pytest 320 passed.
- Verified the bundled canonical scans are committed so the real-scan crown-count
  floor test (14/14) runs in CI rather than skipping - it is the only guard for
  the count-separation fix (the synthetic harness cannot reproduce the real
  arc-position clustering).

Not changed (flagged, lower-confidence): worldDeltaOriented front-back sign still
needs an in-browser confirm; restorePlan() can re-apply a snapshot's tooth_meshes
against a differently-loaded scan (same misalignment class, intentional path);
arrow-mode marker push/lift magnitudes are visual taste.

Verified: node --test 47, pytest 320 passed / 1 skipped, maintainability --strict.
…long scroll)

In-browser review of the segmentation flow surfaced two UX papercuts:

- "Apply accepted segmentation to plan" was a grey secondary button, so the
  review -> apply next step did not read as actionable. Make it a primary (teal)
  action with a trailing arrow.
- The per-tooth proposal (up to 28 rows) was a single-column list in the panel,
  so reaching the Apply button below it meant scrolling the whole section. Make
  the list a self-contained scrollable box (max-height 320px, its own border,
  tighter rows) so Apply stays in view.

Also noted from the in-browser check (no code change): segment -> apply -> moving
real crowns works and the movement direction reads correctly. The "broken teeth"
look in Planned view is the heuristic segmenter's rough wedge cuts separating
under exaggeration (a segmenter-quality ceiling, not a placement bug; Overlay
keeps the shell so gaps are filled). The learned backend remains the real fix.

HTML/CSS only. Verified: node --test 47, maintainability --strict.
…pe learned backend

Three items from the in-browser review:

(A) Default to Overlay when segmentation is applied (segment.js)
  Applying segmentation now sets state.view="overlay" so the real per-tooth
  crowns move against the static shell, which fills the rough inter-tooth gaps.
  This avoids the "shattered" first impression of Planned view (shell hidden) on
  a heuristic-segmented arch. Users can still switch to Planned.

(2) Prominent, higher "Reading the 3D view" legend (index.html, styles.css)
  Moved the marker/arrow key from below the viewer to an accent card near the top
  of the guided Teeth step, leading with the click-to-hold explanation:
  "Click a tooth to hold it still — it turns blue-grey and the plan leaves it
  where it is." Directly answers the "what does clicking do / why are sections
  grey" confusion (grey = held-still teeth).

(C) Scope doc for the learned segmenter (docs/segmentation-learned-backend.md)
  The rough wedge cuts ("broken teeth") are the heuristic's quality ceiling. The
  doc scopes an ONNX MeshSegNet backend that drops into the existing ToothSegment
  contract (no API/UI/plan-model changes), ships as an optional extra with no
  torch runtime, keeps weights out of git, and is gated by the realistic
  accuracy harness plus a proposed crown-compactness metric. Status: proposal.

HTML/CSS/JS + docs. Verified: node --test 47, maintainability --strict.
Captures everything a fresh chat needs to continue Phase 1 of the learned
tooth-segmentation backend without relying on prior conversation context:

- Safety framing (educational, not a medical device; segmentation proposes,
  never diagnoses).
- Repo/branch (feat/v1.2) and the test gates that must stay green
  (pytest 320/1, check_maintainability.py --strict, node --test 47) plus the
  maintainability caps and local commit/push workflow.
- The drop-in contract to preserve (ToothSegment / load_local_segmenter /
  .segment(...)), and the already-wired downstream
  (segment_payload -> mesh_export -> render_meshes "model-generated" ->
  viewer3d fragment mode) that must NOT need changes.
- Current state: heuristic segmenter, the recent place_cuts + count-separation
  accuracy fixes, and why the per-tooth meshes look "shattered" (rough wedge
  cuts = the quality ceiling the learned backend fixes).
- Measurement gates (realistic synthetic harness + bundled real-scan 14/14
  floor) and the new crown-compactness metric to add.
- Hard constraints (optional ml-seg extra, onnxruntime only / no torch runtime,
  weights+datasets never committed, on-device privacy, heuristic stays the
  fallback) and a concrete Phase 1 definition of done.

Linked from docs/segmentation-learned-backend.md. Docs only.
…both docs

Baked the web-check findings into the scope doc and the handoff prompt so the
next session starts from current facts instead of re-searching:

- New "Model availability (checked June 2026)" section in
  docs/segmentation-learned-backend.md: MeshSegNet has MIT-licensed CODE and
  ships pretrained PyTorch weights (upper+lower), but there is NO ONNX export
  (PyTorch only; GLM layers take adjacency matrices, so export is fiddly), the
  WEIGHTS' license is undocumented (private clinical training data), and it needs
  real per-cell preprocessing (<=10k cells, 15-dim features, per arch, 15 classes
  -> FDI). Teeth3DS+/3DTeethSeg'22 is CC BY-NC-ND 4.0 (non-commercial, no
  derivatives) -> not usable for a reusable build. Cited sources inline.
- Candidate approaches now lead with "export the MIT MeshSegNet weights to ONNX"
  (gated on license clearance), heuristic stays the fallback, weights stay
  user-supplied (never committed).
- Data & licensing and Estimate sections updated; estimate split into Phase 1
  (no model, ~1-2 days) vs the model spike (~2-4 days, license-gated).
- Handoff prompt: replaced the "confirm if an ONNX export is available" line with
  the confirmed availability + licensing facts and an explicit "Phase 1 assumes
  no weights / no torch; model export is a separate spike".

Docs only.
…ack + ml-seg extra + crown-compactness metric)

Optional on-device ONNX segmenter dropping into load_local_segmenter() behind
an install/weights check, with the heuristic as the always-on fallback. No model,
no torch at runtime; weights are user-supplied via $OPENSOURCE_ORTHO_SEG_WEIGHTS
and never committed. Adds the ml-seg extra, a crown-compactness measurement metric
+ lab case, and tests for loader preference/fallback and the label->FDI contract.
Removes the now-obsolete Phase 1 kickoff handoff prompt.
…l context, drop Odysseus)

- Collapsing the chat now slides it off-screen as a pop-out drawer with a fixed
  reopen tab, instead of squashing the panel into a vertical bar.
- Replace the AI Basic/Advanced toggle + free-text model + provider select with a
  single model dropdown; each option carries its provider (Local helper, GPT-5.5,
  GPT-5.4, Claude Opus 4.8, Claude Sonnet 4.7, open-source endpoint).
- Remove the context-scope selector: the assistant always uses the full plan
  context; the egress-consent gate remains the sole control on external sharing.
- Remove the Odysseus connector (kind, catalog, provider build) and its doc/UI
  references; open-source/self-hosted endpoints cover that use case.
- Update OpenAI_Agents.md and ui/README.md for the new model-provider behavior.
…proximity map + scale)

New orthoplan/occlusion/ package. The occlusal grid buckets two opposing arches
into a shared xy grid of per-cell biting-surface heights; the signed clearance is
the substrate the proximity overlay and these metrics share. register_bite trusts a
real export's as-scanned bite (identity) when the arches already occlude, and falls
back to a clearly-flagged approximate alignment otherwise. Adds a synthetic
opposing-arch fixture, an occlusion-registration-accuracy lab case, 7 unit tests
(incl. the real bundled scans), and docs/occlusion-registration.md. No runtime deps,
no API/UI change yet.
Add ReviewTier model (STL_ONLY / ENHANCED_RECORDS / CBCT_ATTACHED /
ROOT_BONE_AWARE) as the shared classification of a plan's evidence base.
Root/bone-aware is fail-closed: DataAvailability flags or a bare CBCT
attachment can never promote past CBCT_ATTACHED until registration and
reviewed anatomy exist (Phases 5-7).

Persist a queryable CaseProvenance digest (scan provenance, units, arch,
modality, file ids, engine version, review tier) on each PlanVersion and
surface it in case_api list/version payloads. Add review_tier to the
evaluate output and a review-tier banner to the browser UI.

Completes Phase 1 intake items.

Tests: pytest tests/test_review_tier.py tests/test_case_api.py tests/test_api.py tests/test_examples.py; node --test (ui)
Make the scan-scale gate a consistent cross-rule contract: the segmented
crown collision check now defers with a NOTICE when scan units are
unverified, mirroring the existing movement-cap gate, so no millimeter
finding is reported on untrusted scale.

Add review tier and an explicit unresolved-anatomy gap list (roots,
alveolar bone, periodontal status, occlusion, CBCT anatomy) to the
handoff report. The list is fail-closed: every blind domain is named
unless the plan reaches root/bone-aware review.

The remaining Phase 2 items (auto-segmentation via /api/segment, the
segmentation review UI, per-tooth fragment rendering, segmented-crown
movement, and the shared-engine checks) were already in place; this
commit closes the scale-gate and report items.

Tests: pytest tests/test_collision_optimizer_cases.py tests/test_reporting.py tests/test_review_tier.py
Stage print exports now transform actual per-tooth fragment vertices for
reviewed segmentation links whose meshes resolve in the workspace. Every
other tooth (unreviewed link, or unresolvable/missing geometry) falls
closed to a clearly-labeled schematic proxy box - a SegmentedToothMesh
gains a 'reviewed' flag that gates real-vertex use.

Bump the manifest to v2: add a hashes block (plan, stage frames,
findings, original scans, segmentation fragments) alongside the existing
per-artifact hashes, plus the review tier label and a
uses_real_mesh_geometry flag. The print-package payload echoes the tier
and the real-geometry flag.

Split STL geometry generation into orthoplan/print_stl.py to keep both
modules under the maintainability cap (geometry vs packaging).

Tests: pytest tests/test_printing.py (real-mesh, fail-closed x2, v2 manifest)
Add an optional 'dicom' extra (pydicom). New dicom_intake module parses
ONLY structural study metadata (modality, voxel spacing, dimensions,
orientation, study date) with stop_before_pixels; patient identifiers are
never copied (PHI_TAGS_EXCLUDED) and volume bytes never enter plan JSON.
Intake fails closed when the extra is absent.

CaseRecord gains an optional redacted DicomMetadata; record_workspace
parses it for cbct/dicom records. Add a fail-closed CBCT lifecycle status
(unavailable/attached/registered/anatomy-reviewed) and a 3D Slicer
handoff path (cbct_handoff) surfaced in the evaluate output and a CBCT
panel in the browser UI. Lock the invariant that a CBCT attachment does
not change movement generation.

Tests: pytest tests/test_dicom_intake.py (PHI redaction, fail-closed,
status, handoff, generation invariance); node --test (ui)
Add RegistrationTransform (source STL asset, target CBCT record, 4x4
affine matrix, method, operator/model provenance, RegistrationQuality,
notes) with cross-reference validation on the plan: a registration can
only point at a real mesh asset and a real CBCT/DICOM record.

Acceptance is fail-closed (is_acceptable = accepted AND quality present).
registration_ready/accepted_registration gate CBCT-derived behavior;
cbct_status reports REGISTERED only when ready; root/bone-aware still
requires reviewed anatomy on top. Manual and imported transforms work
today; an Open3D ICP experiment (registration_auto) is gated behind the
mesh-processing extra and fails closed when absent.

Expose registration + quality in the evaluate output and the browser
CBCT panel.

Tests: pytest tests/test_registration.py (matrix validation, fail-closed
acceptance, cross-ref rejection, gating, Open3D-absent path)
Add provenance-bound, fail-closed derived-anatomy models: RootGeometry
(per-tooth root mesh and/or centerline), ToothAxis (trusted long axis),
and AlveolarBoneRecord, all carrying source CBCT record, registration id,
model/operator provenance, confidence, and a ReviewStatus. An object is
'trusted' only when accepted/corrected AND in field; proposed, rejected,
uncertain, or out-of-field anatomy is never trusted.

Plan gains a derived_anatomy container with reference validation (every
object must trace to a real CBCT record, registration, and mesh).
root_bone_aware_ready now requires registration_ready AND a trusted
object, so root/bone-aware review (and the ROOT_BONE_AWARE tier /
anatomy-reviewed CBCT status) only unlocks behind real reviewed anatomy.

Surface per-object trust flags in the evaluate output and add a browser
review panel with accept/correct/reject controls that re-evaluate.

Tests: pytest tests/test_anatomy.py (trust logic, reference rejection,
fail-closed tiers, per-object trust flags)
Add a deterministic root/bone-aware rule that runs only behind trusted
CBCT-derived anatomy: root proximity and inter-root collision (on
reviewed root centerlines, after planned movement), cortical-boundary
proximity (against reviewed alveolar bone bounds), and root/bone context
for tip/torque/intrusion/extrusion/expansion movements on teeth with
reviewed root/axis anatomy.

Fail-closed: when a CBCT is attached but registration, segmentation, or
reviewed anatomy is insufficient, it emits 'cannot assess' notices rather
than guessing; STL-only plans stay silent (the data-gap layer already
reports CBCT unavailable). The structured review verdict is limited to
CONSISTENT, ISSUES, NOT_APPLICABLE and is surfaced in the evaluate output.

Tests: pytest tests/test_root_bone.py (fixture geometry: not-applicable,
cannot-assess, proximity ISSUES, cortical breach, movement context)
Add a browser-generated stored-review export endpoint for mobile handoff, including review tier, data gaps, CBCT/root-bone status, edit-lock metadata, handoff URLs, and review hashes.

Harden the export contract by percent-encoding case IDs in links, accepting only http/https base URLs, making review_sha256 verifiable by excluding itself from the digest payload, and sanitizing browser download filenames.

Wire the browser export action and document the mobile API contract. Add pure export tests plus live server route coverage for encoded handoff links.

Verification: pytest; pytest tests/test_server.py; pytest tests/test_case_review.py tests/test_mobile_contract.py; npm test
Add a case-review export: a self-contained, opaque JSON document a mobile
client stores as a read-only review. It carries the review tier,
unresolved data gaps, finding counts, CBCT and root/bone verdicts, an
explicit edit-lock (requires_browser_engine), and a content digest.

Add a handoff descriptor (open URL / deep link / QR payload) for
reopening the same local or hosted case on a device; base URLs are
validated to http/https and ids are percent-encoded. Expose the builder
via POST /api/case-review (server dispatch refactored into a helper) and
a browser 'Export case review (mobile handoff)' action. Document the
endpoint and the mobile display requirements (tier, gaps, edit-lock) in
the mobile API contract.

Tests: pytest tests/test_case_review.py tests/test_server_case_review.py
Add an orthoplan-case-review-v1 golden fixture and server-side golden regression coverage so mobile schema drift is caught.

Teach iOS and Android to decode validated stored-review JSON, reject non-schema browser-review imports, display review tier/data gaps/edit-lock status, and expose browser/deep-link open paths from the handoff payload.

Add a self-contained browser QR SVG renderer for case handoff payloads, render the QR after case-review export, and cover the renderer with UI unit tests.

Verification: pytest; pytest tests/test_case_review.py tests/test_case_review_golden.py tests/test_mobile_contract.py; npm test; swift test; ANDROID_HOME=/Users/johnlaw/Library/Android/sdk gradle testDebugUnitTest; xcodebuild build -scheme OpenSourceOrthoLite -project mobile/ios/OpenSourceOrthoLite.xcodeproj -destination 'platform=iOS Simulator,name=iPhone 17,OS=26.5'
Add the manufacturing step that turns a stage model into a printable
aligner shell. aligner_shell.py offsets the reviewed stage surface
outward along vertex normals by the sheet thickness and closes it into a
watertight solid (outer offset + reversed inner cavity + stitched rim),
with an optional gingival trim plane.

print_aligner.py emits a per-stage shell STL from real reviewed geometry
only (never proxy teeth); the gingival trim is derived from trusted
CBCT tooth axes and is fail-closed (no trim when the occlusal direction
is unknown). Print settings gain aligner_shell_enabled, sheet_thickness_mm,
and gingival_trim_margin_mm; the manifest records shell artifacts with
thickness, watertight flag, trim status, and hashes. Browser print panel
gets the shell toggle + thickness/trim inputs.

This is geometry generation, not a clinical claim: printing, fit, and
physical use remain the user's own responsibility and risk. A robust
mesh-library offset/boolean path is left as a future enhancement over the
pure-Python vertex-normal approximation.

Rewrote TODO.md: condensed the completed v1.2 phases and added the
effectiveness roadmap (phases 9-14) targeting each honest-rating track to
>= 7/10.

Tests: pytest tests/test_aligner_shell.py tests/test_printing.py
Use trusted reviewed root geometry and tooth axes to rotate real per-tooth mesh vertices about a root-derived center of resistance.

Expose the movement model in evaluation output while failing closed to the existing crown-centroid visualization path when trusted anatomy is unavailable.

Add regression coverage for unchanged no-root movement output and root-apex opposite motion under tipping.
Store capped representative surface samples on segmented tooth links and populate them from local segmentation.

Use bbox prefiltering plus transformed adjacent same-arch sample distances to report contact and estimated IPR, with scale gating and a labeled bbox fallback when samples are unavailable.

Add overlap, clear-pair, fallback, and segmentation-fragment coverage.
Add a synthetic validation benchmark report covering segmentation Dice/IoU, movement millimeter error, collision/IPR precision-recall, and shell thickness error.

Expose the report through the validation package and a validation-benchmark CLI command with JSON output.

Keep benchmark metrics as tracked reported numbers rather than pass/fail gates, with caveats for synthetic fixtures and future reviewed open datasets.
Clean shell input geometry by welding near-duplicate vertices, dropping degenerate triangles, and failing closed when no usable reviewed surface remains.

Expand shell QA with thickness distribution, watertightness, connected components, cleanup counts, shell hashes, and per-stage fail-closed reports.

Surface manufacturing-readiness verdicts in API/export status and print-package manifests using CONSISTENT, ISSUES, and NOT_APPLICABLE vocabulary.
Downgrade Track 1 to an honest ~7/10 and add docs/application maturity.md to define the three tracked maturity surfaces, current scores, gaps, and 10/10 criteria.

Extend shell manufacturing QA with printer tolerance settings, rim closure, approximate self-intersection signals, inner/outer clearance, sliver reporting, API shell-QA readiness findings, and print-package manufacturing summaries.

Refresh stale CBCT/root-bone documentation, regenerate examples and the mobile case-review golden fixture, and add coverage for inverted winding, disconnected islands, skinny triangles, bad geometry, and package-payload QA summary.
Track 1 (upload -> printable aligner artifacts) accuracy + surfacing.

Accuracy:
- build_aligner_shell now bakes printer XY/Z dimensional compensation into the
  exported shell geometry instead of only reporting it in the manifest. The bias
  is applied along vertex normals (XY gain in-plane, Z gain on the build axis) to
  BOTH the inner cavity and outer surfaces, so it shifts the part's outer
  dimensions to cancel printer over/under-cure without changing wall thickness.
  Applied values are recorded in ShellStats and echoed back through the shell QA
  block. Previously the manifest advertised a compensation profile the STL did
  not contain.
- solid_stl now writes the real unit outward facet normal per triangle, computed
  from winding (right-hand rule), instead of a placeholder "0 0 0". Degenerate
  facets still emit 0 0 0. Applies to both stage-model and shell exports, keeping
  the export spec-correct for strict CAD/validation tools.

UI surfacing:
- The guided print step and the technician print panel now show the
  manufacturing-readiness verdict, the applied printer compensation, and a
  per-stage shell QA view (watertight / thickness range / self-intersection
  signals / skip reason). The backend already computed and returned this data;
  it was previously invisible to the user.

Tests:
- Shell compensation defaults/Z-shift/wall-thickness-preservation, manifest
  compensation reporting, real STL facet normals, and DOM-free UI rendering of
  the print QA panel.

Docs: application maturity Track 1 "what exists" updated; TODO status note added.
…, oracle

Raises Track 1 (upload -> printable aligner artifacts) to ~8/10 by removing
three of the gating limitations: approximate self-intersection, opaque verdicts,
and no independent known-good comparison.

Self-intersection + nonmanifold engine:
- New orthoplan/mesh_intersect.py implements a deterministic Möller (1997)
  triangle-triangle intersection test, pure-Python and dependency-free.
- count_self_intersections now uses it as an exact narrow phase behind the
  existing AABB broad phase (made inclusive so a true crossing on a shared
  boundary plane, or a triangle lying in a coordinate plane, is not pre-rejected).
  This replaces the box-overlap approximation that both over-counted and
  under-proved.
- Added count_nonmanifold_edges and a nonmanifold_edge_count shell stat (edges
  shared by more than two faces).

Per-artifact QA explanations:
- print_aligner._failed_checks produces a named list of exactly which
  deterministic checks downgraded a shell to ISSUES (watertight, nonmanifold,
  disconnected pieces, rim closure, self-intersection at zero tolerance, thin
  walls, inner/outer clearance). With a real engine, the self-intersection
  tolerance is now zero rather than the approximation's allowance. The verdict is
  derived from this list, and the manifest + guided/technician print UI surface
  the named reasons.

Verification (independent ground truth + messy corpus):
- tests/test_mesh_intersect.py: engine cases (piercing, separated, parallel,
  coplanar overlap/disjoint) and the count functions.
- tests/test_shell_quality_engine.py: a closed-form slab-volume oracle (math, not
  the builder) confirming the flat-quad shell encloses area*thickness; flat-input
  compensation is a pure translation; and synthetic messy fixtures whose defects
  the QA must name.
- ui/print_qa.test.js: failed_checks rendering.

Docs: application maturity Track 1 rated ~8/10 with updated gaps; TODO snapshot
and status updated.
… fallback)

Starts the real 8->9 lever for Track 1 (a robust mesh backend) with a safe,
mergeable first slice: backend selection, an optional Open3D repair path, and a
fail-closed fallback. The rating is intentionally NOT bumped to 9 - the robust
offset is repair-only and unvalidated in CI (Open3D is not installed), exactly
like the existing automatic-registration experiment.

Backend:
- New orthoplan/aligner_shell_robust.py: optional Open3D mesh-repair shell
  backend behind the mesh-processing extra (merge near-duplicate vertices, drop
  degenerate triangles, remove non-manifold edges, orient consistently, recompute
  robust normals), then reuse the shared offset. Mirrors registration_auto.py:
  guarded import, robust_shell_available() probe, RobustShellUnavailable, and a
  pragma-no-cover heavy-import body.
- settings.shell_backend ("pure-python" default | "robust"). When robust is
  requested but Open3D is missing, the export falls back to pure-Python and
  records the downgrade (fallback_reason) - it never silently changes geometry.
- resolve_shell_backend() identity (requested/used/available/fallback_reason)
  flows into the manifest (aligner_shells.backend), PrintPackageResult, the API
  response (aligner_shell_backend), and the guided/technician print QA UI.

Refactor (single source of truth, maintainability caps):
- Extracted assemble_shell() in aligner_shell.py so both backends share the
  offset, rim stitching, printer compensation, and QA block.
- Moved mesh indexing/topology helpers into new aligner_shell_topology.py to keep
  aligner_shell.py under the 300-line file cap.

Tests + fixtures:
- tests/test_shell_backend.py: default backend, resolution, fail-closed fallback
  when Open3D is absent, and the manifest recording the downgrade; plus a UI test
  for the backend line.
- Regenerated mobile/fixtures/case-review-v1.json and examples/*.json for the new
  shell_backend settings field.

Docs: docs/aligner-shell-backend.md (status, contract, fail-closed behavior, what
a validated 9/10 needs); maturity Track 1 + TODO Phase 9 follow-up updated.
Roadmap-only update (no code change).

- Added an ordered "Path to Track 1 ~9/10" section sequencing the remaining work:
  9.1 scale the pure-Python shell QA -> 9.2 true boolean/SDF (Minkowski) offset in
  the robust backend -> 9.3 install Open3D in a test env and validate the robust
  backend vs pure-Python QA on a messy corpus (the actual ~8 -> ~9 move) -> 9.4
  full-arch known-good fixtures from an independent mesh pipeline. Noted that a
  10/10 is intentionally off-path: it would require material/thermoforming/fit/
  printer-calibration/physical validation that this safety-boundary-first toolkit
  deliberately does not model.

- Added Phase 9.1 (PRIORITY) for the spatial-grid fix, with the reason it is a
  priority: the real triangle-triangle self-intersection engine and the
  min_inner_outer_clearance check are O(n^2)/O(V^2) and run on every pure-Python
  shell build (measured ~16.7s at ~8,460 shell triangles). Because the
  pure-Python path is the always-on default, this must land before the shell QA
  is run on real multi-tooth reviewed plans, or per-stage builds will hang. Tasks:
  spatial-grid broad phase for self-intersection, spatial nearest-neighbor for
  clearance, a wall-clock perf regression test, and identical results on existing
  fixtures.

- Relabeled the robust-backend follow-ups as Phase 9.2-9.4 to match the path.
Roadmap-only update (no code change).

Captures the current clunky state of the in-app assistant, grounded in the code:
- single-turn with no memory (answer_chat_payload sends only [user, assistant];
  no prior turns are threaded back), so a real back-and-forth is impossible;
- no streaming (full answer appears at once after a static status line);
- one combined provider+model <select> with a hardcoded option list and models
  "configured externally" (no per-provider model selection);
- renderChat rebuilds chatMessages.innerHTML every render, so scroll/focus churn
  and no incremental append.

Adds goals in two groups plus safety constraints:
- Goal A (conversational flow): thread bounded conversation history, incremental
  append with preserved scroll/auto-scroll/focus, pending indicator +
  Enter-to-send/Shift+Enter, and token streaming with a non-streaming fallback.
- Goal B (Cursor-style selection): split into provider then model, give each
  connector a real selectable model list (plus free-text for self-hosted),
  surface key/PHI affordances, keep the local helper as the default no-key option.
- Safety (unchanged): keep model output separate from deterministic findings with
  the lint_finding() gate, per-request credentials never stored, and the
  PHI-share acknowledgement + shares_patient_data labeling before any non-local
  provider receives plan context.
Roadmap/docs-only update (no code change). Makes the docs consistent with the
directive that every focus area is a committed ≥9/10 target - previously only
Track 1 had a documented target and ordered path.

application maturity.md:
- Summary table gains a Target column; all surfaces marked ≥9/10.
- Added per-track "Target: ≥9/10" lines pointing at the TODO ordered paths.
- Reframed each "What 10/10 would require" list as "What reaching the ≥9/10
  target requires"; reaffirmed in the intro that 10/10 is intentionally NOT a
  target for the geometry tracks (no material/fit/physical-use modeling).
- Added Track 4: In-App AI Assistant (Chat) as a tracked surface (~4/10 -> ≥9/10)
  with what-exists, why-not-higher, and the path requirements.

TODO.md:
- Honest-effectiveness snapshot gains a Target column and a chat row.
- Replaced the single "Path to Track 1 ~9/10" with a "Targets: all four surfaces
  to ≥9/10" section containing ordered paths for Tracks 1-4, each referencing the
  existing phases.
- Added Phase 16 (full triangle-level collision/IPR for Track 2) so the Track 2
  path reference resolves.
Roadmap/docs-only update (no code change).

TODO.md:
- Replaced the four separate per-track "Path to ~9/10" blocks with one global
  "Order of operations to ≥9/10" section, grouped into dependency waves:
    Wave 0  Phase 9.1 (shell QA perf - unblocks real arch use, do first)
    Wave 1  Phase 15 (chat) + Phase 13 (benchmark corpus) - independent, parallel
    Wave 2  Open3D test env -> 9.2/9.3 (robust offset + validation) -> 16 -> 9.4
    Wave 3  Phase 14 (learned segmentation, benchmarked)
    Wave 4  Phase 12a -> 12b -> 12c (CBCT raw volume; longest road)
- Reordered the detailed phase sections to match that execution order (Phase 15
  now precedes Phase 16) and tagged each section with its wave.
- Made the shared Open3D test-environment prerequisite explicit in Wave 2.

Removed stale info:
- The "current status" QA bullet no longer describes self-intersection as
  "approximate signals" - the real triangle-triangle engine replaced it.
- application maturity.md cross-references now point at the new
  "Order of operations" section instead of the deleted per-track "Path to" blocks.
Update TODO.md to remove stale status, mark completed shell QA, benchmark, chat, and initial triangle-contact work, and document the remaining streaming/Open3D/full-geometry gaps.

Scale pure-Python aligner shell QA by replacing quadratic self-intersection broad phase and inner/outer clearance scans with spatial-grid based exact checks. Add parity and full-arch-scale performance coverage.

Expand validation benchmarks with reviewed non-PHI corpus metadata, baseline deltas, messy shell metrics, and sampled-vs-triangle collision distance metrics. Split benchmark models and helper registries into focused modules to keep maintainability guardrails green.

Improve plan AI chat with bounded conversation history, incremental message rendering, provider/model split, per-provider model memory, connector model catalogs, custom model IDs, and request-scoped credential/share handling.

Add an optional in-memory triangle-level collision/IPR distance path while preserving the sampled and bbox fallback behavior, with tests comparing sampled and triangle contact distances.

Tests: tools/check_maintainability.py --strict; pytest (474 passed, 1 skipped); ui npm test (69 passed).
Add reviewed mesh workspace triangle extraction for collision/IPR evaluation so reviewed per-tooth STL assets are loaded from the local mesh registry without serializing geometry into plan JSON. Keep unreviewed, missing, or invalid mesh links fail-closed on the existing sampled/bbox fallback.

Add an SSE chat streaming endpoint and UI stream consumer for connectors that advertise streaming, while retaining the existing JSON chat fallback for local and non-streaming providers.

Refresh TODO.md to mark Phase 16 and the Phase 15 streaming remainder complete, prune stale roadmap text, and update effectiveness/status estimates.

Verification: pytest (478 passed, 1 skipped); cd ui && npm test (69 passed); tools/check_maintainability.py --strict (passed).
Implement the next Phase 9 wave for printable aligner shells by splitting shared shell assembly from backend-specific surface generation, upgrading the optional Open3D robust backend from repair-only behavior to repair plus distance-field offset correction, and adding synthetic messy/full-arch validation metrics for robust-vs-pure shell QA.

Wire the validation metrics into the benchmark report and add tests for both no-extra skip behavior and Open3D-enabled validation cases. Add a dedicated mesh-processing CI lane so the optional robust path is exercised without changing the default no-extra install.

Clean TODO.md and application maturity docs so completed Phase 9/15/16 work is summarized once, remaining work is limited to Phase 14 segmentation maturity and Phase 12 CBCT/root-bone automation, and current maturity scores reflect the implemented shell, collision, and chat streaming work.

Verification: python3 -m pytest tests/test_shell_backend.py tests/test_validation_benchmarks.py tests/test_aligner_shell.py -q; python3 tools/check_maintainability.py --strict; python3 -m pytest -q (with local loopback socket permission).
Guided "Review your plan" step now leads with an at-a-glance dashboard:
a verdict hero (ready / needs-review / cannot-assess) plus edit-diff,
warnings, root/bone, and print-readiness cards and 3D overlay chips,
backed by the new guidedReviewDashboard() in ui/guided.js.

Post-implementation audit fixes (found while reviewing the change):

- Severity classification: the dashboard counted EVERY finding as a
  blocking warning. Findings carry severity info|notice|warning, and the
  rule engine emits an `info` root-bone-context finding for healthy
  root/bone-aware plans, so a clean plan wrongly read "Needs review —
  1 warning(s)". Only warning-severity findings now drive the verdict,
  summary, and warnings card; unknown/missing severity is treated as a
  warning (fail-safe surfacing).
- Overlay chips: highlights matched code.includes("movement-cap"/
  "collision"), which also matched the *-scale-unconfirmed NOTICE codes
  (check skipped, not violated), producing phantom overlay chips.
  Highlights are now derived from warning-severity findings only.

Tests: realistic fixtures (findings carry severity) plus regressions
proving an info finding stays "ready" and skipped-check notices emit no
overlay chips. Full suite green (75 UI, 489 Python, maintainability).

Docs: README hero image swapped to the intraoral arches photo;
application maturity (Track 2) and TODO current-status updated to
describe the severity-aware guided review dashboard.
@john-lawniczak john-lawniczak merged commit 450e8d4 into main Jun 12, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant