Feat/v1.2 by john-lawniczak · Pull Request #2 · john-lawniczak/OpenSourceOrtho

john-lawniczak · 2026-06-09T17:17:28Z

Type

Summary

How it was tested

Checklist

PR title uses a type tag (Add:, Fix:, ...).
pytest -q passes.
python tools/check_maintainability.py --strict passes.
cd ui && node --test passes.
New behavior and any fixed regression have tests.
No patient data; no verdict language; mechanics stay in deterministic rules.
Docs updated if behavior or provider behavior changed.

Click a tooth in the 3D preview to select it, then nudge its final in-plane position (mesiodistal x + front-back y) in 0.2 mm steps. The authored target is written as a normal source:"manual" stage delta, so the engine still computes all movement and Generate Plan re-stages it via the existing authored path (aligner count from the standard timeline projection). Honesty constraints baked in: translation only (no rotation - scan frame is rotation_renderable=false; no vertical z), gated on confirmed scan units (mm), and framed as a geometric target, not a treatment goal or approval. - ui/manual_edit.js (pure/DOM-free) + ui/manual_edit.test.js (13 cases) - ui/viewer3d.js: proxies tagged with userData.tooth; click-raycast picking (click-vs-orbit threshold) + setSelectionHandler/Enabled/Tooth + highlight - ui/render.js: renderManualEdit() panel + selection wiring - ui/app.js: nudge/reset/deselect handlers; ui/state.js: manualEdit state - ui/index.html + ui/styles.css: Manual target panel - No backend/segmentation-API change: picking resolves the tooth from the rendered crown the segmentation API already provides Tests: 38 UI, 292 pytest, maintainability --strict all green.

Introduce an optional Open3D mesh-processing extra and a hybrid arch segmenter that scores candidate tooth boundaries from arch position, crown-height valleys, curvature, and face-normal changes before producing graph-cut-style per-tooth STL proposals. Wire the hybrid segmenter through the existing SegmentationModel seam and /api/segment response metadata while keeping the prior heuristic as fallback. Add tests for the hybrid diagnostics, backend metadata, and safer manual review behavior. Improve the segmentation review UI by surfacing proposal method/backend details, flagging low-confidence rows, and rejecting duplicate corrected FDI tooth numbers during explicit apply. Refresh README, HOW_TO, architecture, UI, license, and source-reference docs to reflect the optional Open3D path and the legal boundary around GPL external tools.

Make tooth-segmentation accuracy measurable. Until now the hybrid segmenter shipped without any correctness metric - nothing checked whether a proposed cut landed on the right boundary or carried the right FDI label, so changes could not be proven to improve accuracy. - orthoplan/validation/segmentation_truth.py: build a synthetic arch whose per-triangle tooth membership is known by construction (cosine crowns + valleys on a horseshoe), run the active segmenter, and score it. Metrics: region_purity (boundary quality, label-invariant), triangle_label_accuracy (boundary + correct FDI label), labels_recovered, mean_centroid_arc_error_deg. - orthoplan/validation/segmentation_cases.py: two Measurement Truth Lab cases on the active load_local_segmenter() - segmentation-full-arch-accuracy (PASS gate: exact count, purity >= 0.78, label_acc >= 0.55) and segmentation-missing-tooth (records tooth_count_error from the fixed-canonical-count assumption; gates purity + labels_recovered). Synthetic, PHI-free, deterministic, CI-fast. - Wire both into measurement_truth_cases(); surfaced via orthoplan measurement-lab. - tests/test_segmentation_truth.py (7 cases). Baseline on a clean synthetic full arch: purity 0.826, label_acc 0.629 - the segmenter mislabels ~37% of triangles even on ideal input, and a missing tooth yields tooth_count_error 1. The purity/label-accuracy gap is the labelling cascade, which this harness now measures so the next task (data-driven tooth count + missing-tooth handling) can prove a gain. Tests: 300 pytest, maintainability --strict, 38 UI tests all green.

The segmenter assumed a full canonical 14-tooth arch and always made that many cuts, so an arch with a tooth missing was over-split (one real tooth fragmented, all labels cascaded). The accuracy harness pinned this as tooth_count_error 1. Detect the count from the scan instead of assuming it: - arch_profile.detect_cut_count: counts prominent embrasures (minima for the height profile, maxima for the hybrid cost signal) with min-separation dedup, capped at canonical; returns 0 (caller falls back to canonical) on a flat or unusable signal so low-relief scans never collapse. - heuristic.resolve_tooth_count / teeth_from_profile: bounded count + FDI labels. Both segmenters count from the physical HEIGHT valleys (most reliable) and keep their own signal for boundary PLACEMENT. Shared arch_profile.arc_signal extracted. - On a count != canonical: per-segment confidence x0.6 and a linted review advisory (auto.build_count_advisories) via /api/segment, because which tooth is absent cannot be known from crown geometry - FDI labels on a gap arch are a positional guess the user must review (follow-on: a 'mark missing tooth' signal). Harness ratcheted: segmentation-missing-tooth now asserts tooth_count_error == 0 and region_purity >= 0.80. On the 13-tooth synthetic arch: 13/13 regions (err 0), purity 0.83 -> 0.87. Full arch unchanged (14/14). triangle_label_accuracy drops on gap arches (0.61 -> 0.19) - the recorded, honest cost of unsolved FDI-on-gaps; region quality and count are the geometry-solvable win this delivers. Tests: 303 pytest (+3), maintainability --strict, 41 UI tests all green.

…& Photos guide tab Investigation: the sidebar 'Key Terms & Tooth Map' link (and the FDI tooth map inside it) appeared dead for new users. Root cause is a mode/CSS conflict, not a broken handler: the link lives in the sidebar (visible in both modes) and clicking it sets activeStep=glossary and activates #panel-glossary, but 'body[data-mode="simple"] .panel { display: none }' keeps every technician panel hidden in the default guided mode - so nothing appears. Fix: - CSS override (ID selectors beat the mode rule): when the active step is an info panel in guided mode, surface #panel-glossary / #panel-photos over the wizard and hide #guided. - Back affordance so the panels are not dead ends: goToStep() remembers the prior step (state.returnStep) when an info step opens; a 'Back' button in each panel returns there. INFO_STEPS = [glossary, photos]. New tab - Imaging & Photos Guide (#panel-photos), mirroring the glossary and reachable from the sidebar in both modes: - Relative-value table (CBCT DICOM 10/10, STL+Periapical 7-8/10, STL+Panoramic 7/10, STL+Photos 6/10, STL only 5/10, Panoramic only 5/10, Photos only 3/10) with what each input adds and a rough USD cost. - Framed as how much each record helps THIS engine close data gaps (the planner uses crown-surface STL geometry only); educational, not medical advice, and noting X-ray/CBCT are ionizing radiation ordered by a licensed professional. Files: ui/index.html, ui/state.js, ui/app.js, ui/styles.css, docs/UI_DESIGN.md. Validated: 41 UI tests, JS syntax, HTML structure + unique ids; e2e selectors untouched (additive), runs in CI.

Geometry cannot tell WHICH tooth is absent, so on a gap arch the data-driven count gets the right number of regions but positional FDI labels (triangle label accuracy ~0.19 on the synthetic gap arch). This adds the user signal that closes that gap: the user marks the missing tooth, and regions are labelled by the canonical order minus that tooth. - auto.SegmentationModel.segment now accepts tooth_values; auto.tooth_values_for_arch builds the explicit labels from marked gaps (filtered to the scan's arch). - segmentation_api parses payload.missing_teeth and threads it per-arch into the segmenter; the count-difference advisory still surfaces for review. - UI: a 'Missing teeth (FDI, optional)' field in the segmentation panel; segment.parseMissingTeeth (pure, unit-tested) sends {missing_teeth} to /api/segment. - Harness: new segmentation-missing-tooth-marked case proves the gain - marked label accuracy 0.60 vs unmarked 0.19 on the 13-tooth synthetic arch, purity held. This turns the recorded triangle_label_accuracy drop from the data-driven-count change back into a gain, and pairs with the per-tooth manual correction already in the review UI. Tests: 306 pytest (+5), maintainability --strict, 40 UI tests all green.

…tabs The single 'Key Terms & Tooth Map' panel held both the FDI tooth map and the A-Z glossary. Split into two independent sidebar tabs: - Tooth Map (#panel-toothmap): FDI numbering explainer, chart, quadrant map. - Glossary (#panel-glossary): searchable key-terms list. Both (plus Imaging & Photos Guide) are reachable from the sidebar in either mode, surface over the guided wizard when active, and have their own Back button. INFO_STEPS += toothmap; the guided-mode CSS override covers #panel-toothmap. Validated: 40 UI tests, HTML structure + unique ids; e2e selectors untouched.

Surface what the engine already computes and close the loop with the mark-the-missing-tooth signal: - Per-tooth confidence in the review rows is now tier-coloured (low <45 red / mid <65 amber / high green) with a 'Review' tag on low, so low-confidence and count-mismatch teeth stand out instead of reading as a uniform bar. - When a proposed arch is not a full arch (14), an amber banner prompts the reviewer to enter the missing FDI number in 'Missing teeth' and use a new 'Re-anchor labels' button, which re-runs the proposal with the marked gap (same code path as Propose) so the FDI labels line up around the gap - no re-upload. - Pure logic (confidenceTier, countNoteMarkup, FULL_ARCH_TEETH) lives in DOM-free core.js and is unit-tested; render.js imports it; app.js wires reanchorSegment. Tests: 43 UI tests (+3), maintainability --strict green.

The auto-segmentation review (proposal, per-tooth corrections, marked missing teeth, applied fragment) was lost on every page reload. It is working state, not plan data, so it is now persisted browser-local in localStorage keyed by plan id - without polluting the TreatmentPlan model (which the version snapshot validates strictly). - storage.js: saveSegmentationReview / restoreSegmentationReview / clearSegmentationReview (localStorage, plan-id keyed, failures non-fatal). - app.js: persist on propose/re-anchor, apply, per-tooth correction, include toggle, and missing-teeth input; restoreStoredSegmentation() on load. - restorePlan now rehydrates state.segmentation.applied from the restored snapshot's mesh_assets/tooth_meshes (previously dropped on restore) and reloads the plan's review draft; the snapshot's applied meshes win for that version. Tests: 45 UI tests (+2 storage round-trip and no-plan-id/no-storage guards). maintainability --strict green (no Python changed).

Model the real open-extraction-gap case in the segmentation harness, then fix the bug it found. - build_synthetic_arch(gaps=...) fills a tooth's sector with a flat low gum surface (no crown), leaving a true one-tooth-wide hole; gum triangles carry no ground-truth tooth and the arch's tooth_values are the crowns actually present. - The harness immediately surfaced a real bug: the data-driven counter, which counted the VALLEYS between crowns, over-counted an open gap - the wide gum floor's two shoulders each read as a cut, so a 13-crown arch was counted as 14. - Fix: resolve_tooth_count now counts crown PEAKS in the height profile instead of valleys. A gum hole has no peak, so it can never read as an extra tooth. Correct across every scenario - full 14/14, congenitally-absent 13/13, open extraction gap 13/13, two adjacent extractions 12/12 (tooth_count_error 0 throughout). - New lab case segmentation-open-gap gates the count and purity on the gum-hole arch; resolve_tooth_count signature simplified (always crown peaks). This answers the open question before trusting the counter on real scans: a true gap is now handled, and the mark-the-missing-tooth signal still recovers labels. Tests: 310 pytest (+4), maintainability --strict, 45 UI tests green.

Every segmentation test so far ran on synthetic arches built to present crowns as clean height peaks. This runs the SAME active segmenter on the real canonical sample scans (~990k / ~860k vertices) to answer: does the synthetic-tuned algorithm survive real geometry? It is a loose smoke check, not an accuracy gate (the scans are unlabelled): no crash, finishes well under a 30s budget (~0.5s actual), a plausible crown count within the algorithm's own [floor, canonical] bounds, and valid unique in-arch labels and confidences. It records the observed counts so the gap is tracked. Finding it surfaced (documented in the test docstring): the mandibular scan counts 14/14 crowns correctly, but the maxillary scan counts only 7/14 - the upper occlusal height profile (curve of Spee/Wilson, palate, shorter anteriors) presents only 7 peaks prominent enough to clear the relative threshold. The lower-arch case is gated (>= 12) to catch real-data regressions while the upper undercount stays open as the next task. Tests: 313 pytest (+3), maintainability --strict green.

The sim-to-real diagnostic showed the crown-peak counter, tuned on synthetic arches, undercounting the real upper scan 7/14. Cause: at the coarse profile resolution the flat maxillary occlusal plane (curve of Spee/Wilson, palate) merges adjacent crowns into a single height peak, so fewer than half register. Fix: count crowns from a FINER, dedicated height profile (16 buckets/tooth, prominence ratio 0.18, half-tooth minimum peak separation), decoupled from the coarser profile the segmenters use to PLACE boundaries. Counting and placement no longer share a resolution, so merged real crowns resolve without disturbing boundary placement or per-tooth purity. - heuristic.teeth_from_profile -> teeth_from_signal(arch, positions, heights): builds its own count profile; resolve_tooth_count scales peak separation to the profile resolution. hybrid passes positions/heights through. - Result: maxillary 7 -> 12, mandibular 14 (unchanged). 12/14 is the realistic ceiling for 1-D height counting (two upper crowns genuinely merge); the rest is a positional guess the user closes via mark-the-gap / re-anchor. - No regression: synthetic cases unchanged (full 14, missing 13, open-gap 13, two-gap 12), flat-fixture fallback holds, mandibular real count held. The real-scan diagnostic now gates floors (maxillary >= 10, mandibular >= 12). Tested against the real sample scans AND the synthetic harness. Tests: 314 pytest (+1 real-scan regression guard), maintainability --strict green.

… tooth A proposed count below a full arch is ambiguous: a tooth may be absent, OR two crowns may have merged into one region (common on the flat upper occlusal plane - the maxillary 12/14 case from the real-scan fix). The banner previously always told the reviewer to 'enter the missing tooth', which is misleading for a merge. - core.countNoteMarkup now takes markedGapCount. With no gap marked it states the ambiguity honestly ('Some crowns may have merged into one region, or a tooth may be absent ... otherwise review the tooth numbers'). Once the reviewer has marked gap(s) it becomes confirmatory ('Proposed N for your M marked gaps - review, then apply') and no longer re-prompts. - render.js passes the marked count (parseMissingTeeth of the missing-teeth field). Tests: 47 UI tests (+2 for the ambiguous and confirmatory branches).

…draw; Rx guide & prompts 3D viewer / guided wizard (ui/): - Show the single 3D viewer in the guided "Teeth & time" and "Details" steps, not just "3D preview". The viewer is relocated into the active step's host (plan/details/preview) by guided.js; in teeth/details the technician toolbar, manual-target panel, and caveats are hidden so it reads as a focused aid. - Visual tooth selection: clicking a tooth in those steps toggles "hold still" (folds into fixed_teeth), the visual equivalent of the checkbox list. Held teeth render in a blue-grey HELD material. Picking is enabled there without requiring confirmed mm units. The Details step forces overlay at the last stage so the "Preview movement scale" slider visibly drives on-screen motion. - Anchor the per-tooth movement proxies onto the uploaded scan instead of floating them in schematic arch space below it (the "anchors" in the sample). On scan load, computeScanAnchors() fits the schematic arch into each arch's world bounding box (x/z) and raycasts the scan surface for the occlusal height, caching a per-tooth anchor point + fit scale; update() places and scales the proxies on those anchors so movement reads on the scan's own crowns. With no scan, the schematic fallback is unchanged. Movement stays simulated/labeled. NOTE: WebGL can't render in the sandbox; needs an in-browser visual check. Tooth-numbering chart (ui/tooth-chart.svg, docs/images/fdi-tooth-map.svg): - Replaced the flat box chart with a generated occlusal "into the mouth" view: pink gum + palate/tongue pads, mucosa depth, soft shadows, glossy enamel with cusp detail, anatomically proportional crown sizes, Universal numbers in circles + small FDI numbers, quadrant labels. Generated by new committed tools/gen_tooth_map.py + tools/tooth_map_draw.py. - Tooth Map panel + GLOSSARY: define both FDI and Universal and explain simply why two numbering systems exist. Imaging & Photos guide (ui/index.html): - Split the "starting from scratch" advice into two bullets and added a "What is the Rx file?" card (scanner-export prescription is context only, not part of the STL, carries no geometry). Prompts: - New prompts/ directory: README + transverse-arch-width-sanity-check.md (example iTero/OrthoCAD independent arch-width check; educational, not a diagnosis). Verified: node --check (all changed JS), node --test (47), pytest (314 passed, 1 e2e skipped — Playwright not installed), check_maintainability.py --strict.

…swapped On a two-arch scan, computeScanAnchors() placed each arch's text label on the occlusal (bite) side - upper label just under the upper arch, lower label just over the lower arch - so both landed in the bite gap and read as swapped. Place them on the outside instead: the upper label above the upper arch and the lower label below the lower arch, clear of the crowns.

…cc 0.63->0.93) Root cause: both segmenters placed inter-tooth cuts from rigid equal-spacing nominals, snapping only within +/-half a tooth and clamping a colliding cut to previous+1. When two nominals wanted the same embrasure that produced a one-triangle SLIVER region, which shifted every downstream FDI label by one position - capping the gated full-arch triangle_label_accuracy near 0.63 despite clean geometry. Fix: a shared place_cuts() (orthoplan/segmentation/arch_profile.py) selects the most PROMINENT valleys (or score peaks, for the hybrid segmenter) subject to a minimum separation, so cuts land on the true embrasures of an anatomically uneven arch and two cuts can never collapse onto one. Equal spacing remains only as a fallback to guarantee the required number of distinct boundaries. Rewired heuristic.find_boundaries and hybrid._find_graph_cut_boundaries onto it. Measured on the synthetic accuracy harness: - full arch: label 0.63 -> 0.93, region purity 0.83 -> 0.93 - open gap: label 0.19 -> 0.87 - realistic: label/purity 0.93 (new gated case) Raised the full-arch regression floors (purity 0.78 -> 0.88, label 0.55 -> 0.85). Accuracy harness made realistic so future segmenter work can prove a real-scan gain: synthetic_arch.build_synthetic_arch now supports uneven crown widths (realistic_widths: molars wide, incisors narrow), flat molar occlusal plateaus (occlusal_flat, which merge into one peak and tempt under-counting), and deterministic per-triangle height noise. Added the gated segmentation-realistic-arch-accuracy case (uneven + flat + noise, floors 0.85) plus tests for it and for sliver-free regions on an uneven arch. Refactor: arch CONSTRUCTION moved to orthoplan/validation/synthetic_arch.py; segmentation_truth.py keeps scoring and re-exports the construction API (keeps both files within the maintainability line/function caps). Note: the real OrthoCAD upper shell still detects 12 of ~14 crowns - that is the COUNT-detection signal (resolve_tooth_count), a separate lever from boundary placement and the next accuracy target; the realistic harness is the measurable floor a learned backend would have to beat. Verified: pytest 316 passed / 1 skipped, check_maintainability.py --strict, node --test 47.

…ak separation resolve_tooth_count counted crown peaks with a minimum separation of half an AVERAGE tooth. On a real arch the narrow anterior teeth cluster closer in arc-position (polar angle about the arch centroid) than that, so two real incisor peaks were merged and the upper arch under-counted at 12/14. The fine count profile already resolved all 14 peaks - the separation, not the signal, was the limiter (so 12/14 was not the merged-peak ceiling the docstring assumed). Tighten _COUNT_SEPARATION_FRACTION 0.5 -> 0.35 (a third of a tooth). Swept the value against both real shells and every synthetic case: real upper recovers 12 -> 14, lower stays 14, and no synthetic arch over-counts even at 0.3 height noise (the prominence threshold still rejects noise bumps). The bundled canonical scans now recover 14/14 on both arches. Lock-in + docs: - Raised the real-scan crown-count floors (test_segmentation_real_scan) from maxillary 10 / mandibular 12 to 14 / 14, and corrected the module docstring (the under-count was a separation artefact, not a 1-D height ceiling). - Confirmed on a real OrthoCAD export (not in the repo): both arches 14/14 with correct FDI labels. Verified: pytest 316 passed / 1 skipped, check_maintainability.py --strict.

… in 3D Once the user applies a segmentation, the viewer now shows the REAL per-tooth crowns moving on the scan instead of synthetic proxies on a schematic arch - closing the "anchors move, not the teeth" gap for segmented scans. viewer3d.js: - loadToothFragments() fetches each per-tooth STL fragment (which carries the original scan-space triangles), orients it exactly like the shell (orientScanGeometry) and deliberately does NOT center it, so the fragments reconstruct the arch sitting on the real crowns. Cached by FDI value; cleared when the scan changes. - update() gains a fragment mode (active when fragments are loaded and a scan is present): the planned layer draws the real crowns translated by worldDeltaOriented() - the orientScanGeometry-consistent movement map (scan x,y,z -> world x,z,-y) - while the whole-arch shell remains the static baseline (shown in current/overlay, hidden in planned). Crowns are pickable, honour the held/selected materials, draw movement lines and tooth-number labels. Per-tooth ROTATION is deferred (translation is the unambiguous, high-value motion; correct rotation needs a trusted oriented per-tooth frame). render.js: - Routes render_meshes whose source is "model-generated" (applied segmentation), when a scan is loaded, to loadToothFragments; demo/class meshes keep the existing centered + schematic-arch path. Verified the server data path end to end in Python (segment the bundled canonical upper scan -> apply the fragment -> evaluate): 14 tooth_meshes, every render_mesh source "model-generated", 14 tooth_frames. That check caught the provenance value being the hyphenated "model-generated" (not "model_generated"), which the viewer filter now matches - otherwise fragment mode would have silently never activated. The 3D result itself is not renderable in this environment; it needs an in-browser check (crowns overlay the scan and move sensibly, especially the front-back direction). The sample test case is unchanged (no applied segmentation -> proxies). Verified: node --test 47, pytest 316 passed / 1 skipped, maintainability --strict.

… pegs) For a whole-arch scan that has NOT been segmented there are no per-tooth crowns to move, so the viewer previously floated synthetic peg crowns anchored on the scan - which read as "the anchors move, not the teeth". Replace that with an honest indicator layer: a small teal marker dot on each tooth plus a blue arrow showing where that tooth is planned to move (length scaled by the preview slider). No fake crowns; the scan itself stays the teeth. viewer3d.js: - New arrowMode (a scan is loaded with anchors but no segmented fragments). Each pose draws a marker at the tooth's on-scan anchor and, when planned and the tooth is not held still, an arrow via addMovementArrow() (shaft line + shared cone head) along worldDeltaOriented() - the same scan-consistent movement map as the segmented crowns. Markers carry userData.tooth and reuse the HELD/SELECTED materials, so guided click-to-hold and the held tint still work; below a small displacement only the marker shows. - Precedence: fragmentMode (segmented -> real crowns) > arrowMode (scan, no seg -> markers+arrows) > schematic proxies (no scan at all, e.g. the educational demo). This also upgrades the sample test case (canonical scans, no applied segmentation) from the floating pegs to markers + arrows. The 3D result is not renderable in this environment; needs an in-browser check of marker placement and arrow direction (shared worldDeltaOriented sign). Verified: node --test 47, pytest 316 passed / 1 skipped, maintainability --strict.

…ided UI In-browser confirmed the new 3D indicators work (teal markers on teeth, click-to-hold, movement arrows). Make their meaning explicit so users understand the view: - Plan (Teeth & time) step: clearer copy - "hold a tooth still" means the plan leaves it where it is; untick it in the list or click its dot in the 3D view, click again to release. - New .viewer-legend key under the plan viewer: teal dot = a tooth you can click to hold still, blue-grey dot = held still (won't move), blue arrow = which way that tooth is planned to move (simulated; the Details slider scales it). Swatch colours match the viewer materials. - Details step: note that the scale slider grows/shrinks the arrows and that clicking a dot still holds a tooth. HTML/CSS only. Verified: node --test 47, maintainability --strict.

…ar of crowns Cosmetic pass after the in-browser check (TEST.mov confirmed the Details slider scales the movement arrows correctly). - Markers were anchored to the incisal/occlusal edge, which faces the bite, so in an anterior view both arches' dots bunched into the central gap rather than reading as one-per-tooth. Re-anchor each marker onto the tooth's BUCCAL face at mid-crown: push outward from the arch centre (scanHx*0.10) and lift toward the crown body (span*0.22 from the occlusal edge). Dots now sit on the visible faces and the upper/lower rows separate; arrows originate from the face. - Arch labels: scale the offsets to the arch height (span*0.18 above the upper / below the lower, +scanHz*0.35 toward the camera) so they clear the crowns instead of sitting among them. Only arrow-mode marker/arrow positions change; fragment mode (real crowns) and the no-scan schematic proxies are untouched. Verified: node --test 47, maintainability --strict. The 3D placement itself needs an in-browser confirm.

…audit) Post-implementation audit of the 3D segmentation/arrow work. Three correctness fixes + test coverage for the new accuracy primitive. 1. loadToothFragments silent failure / unhandled rejection (viewer3d.js) A network throw in fetch() rejected the Promise.all, and the caller (render.js) has no .catch, so a transient fetch error became an unhandled rejection and aborted the whole fragment load. Wrap each fragment fetch in try/catch: failures are swallowed per item (the shell still covers that tooth) and a partial load renders the crowns that did arrive. 2. Stale-segmentation misalignment (app.js setUploadedFiles) Uploading a new scan did not invalidate a previously-applied segmentation. Its per-tooth meshes are in the OLD scan's coordinates, so they would be loaded and rendered misaligned over the new scan. Reset the segmentation (proposal/edits/applied) on a new upload, preserving the user's missingTeeth input. 3. Geometry leak on dispose() (viewer3d.js) dispose() freed line geometries and label sprites but not the per-viewer fragment crowns or the uploaded-scan meshes. Dispose both (the shared synthetic/class caches are module-level and intentionally retained). Tests: - Direct unit tests for place_cuts (the shared adaptive cut placer): picks the deepest valleys at uneven spacing, enforces min-separation (the sliver / label-shift regression), selects maxima in peaks mode (hybrid path), and falls back to distinct sorted interior cuts on a flat signal. pytest 320 passed. - Verified the bundled canonical scans are committed so the real-scan crown-count floor test (14/14) runs in CI rather than skipping - it is the only guard for the count-separation fix (the synthetic harness cannot reproduce the real arc-position clustering). Not changed (flagged, lower-confidence): worldDeltaOriented front-back sign still needs an in-browser confirm; restorePlan() can re-apply a snapshot's tooth_meshes against a differently-loaded scan (same misalignment class, intentional path); arrow-mode marker push/lift magnitudes are visual taste. Verified: node --test 47, pytest 320 passed / 1 skipped, maintainability --strict.

…long scroll) In-browser review of the segmentation flow surfaced two UX papercuts: - "Apply accepted segmentation to plan" was a grey secondary button, so the review -> apply next step did not read as actionable. Make it a primary (teal) action with a trailing arrow. - The per-tooth proposal (up to 28 rows) was a single-column list in the panel, so reaching the Apply button below it meant scrolling the whole section. Make the list a self-contained scrollable box (max-height 320px, its own border, tighter rows) so Apply stays in view. Also noted from the in-browser check (no code change): segment -> apply -> moving real crowns works and the movement direction reads correctly. The "broken teeth" look in Planned view is the heuristic segmenter's rough wedge cuts separating under exaggeration (a segmenter-quality ceiling, not a placement bug; Overlay keeps the shell so gaps are filled). The learned backend remains the real fix. HTML/CSS only. Verified: node --test 47, maintainability --strict.

…pe learned backend Three items from the in-browser review: (A) Default to Overlay when segmentation is applied (segment.js) Applying segmentation now sets state.view="overlay" so the real per-tooth crowns move against the static shell, which fills the rough inter-tooth gaps. This avoids the "shattered" first impression of Planned view (shell hidden) on a heuristic-segmented arch. Users can still switch to Planned. (2) Prominent, higher "Reading the 3D view" legend (index.html, styles.css) Moved the marker/arrow key from below the viewer to an accent card near the top of the guided Teeth step, leading with the click-to-hold explanation: "Click a tooth to hold it still — it turns blue-grey and the plan leaves it where it is." Directly answers the "what does clicking do / why are sections grey" confusion (grey = held-still teeth). (C) Scope doc for the learned segmenter (docs/segmentation-learned-backend.md) The rough wedge cuts ("broken teeth") are the heuristic's quality ceiling. The doc scopes an ONNX MeshSegNet backend that drops into the existing ToothSegment contract (no API/UI/plan-model changes), ships as an optional extra with no torch runtime, keeps weights out of git, and is gated by the realistic accuracy harness plus a proposed crown-compactness metric. Status: proposal. HTML/CSS/JS + docs. Verified: node --test 47, maintainability --strict.

Captures everything a fresh chat needs to continue Phase 1 of the learned tooth-segmentation backend without relying on prior conversation context: - Safety framing (educational, not a medical device; segmentation proposes, never diagnoses). - Repo/branch (feat/v1.2) and the test gates that must stay green (pytest 320/1, check_maintainability.py --strict, node --test 47) plus the maintainability caps and local commit/push workflow. - The drop-in contract to preserve (ToothSegment / load_local_segmenter / .segment(...)), and the already-wired downstream (segment_payload -> mesh_export -> render_meshes "model-generated" -> viewer3d fragment mode) that must NOT need changes. - Current state: heuristic segmenter, the recent place_cuts + count-separation accuracy fixes, and why the per-tooth meshes look "shattered" (rough wedge cuts = the quality ceiling the learned backend fixes). - Measurement gates (realistic synthetic harness + bundled real-scan 14/14 floor) and the new crown-compactness metric to add. - Hard constraints (optional ml-seg extra, onnxruntime only / no torch runtime, weights+datasets never committed, on-device privacy, heuristic stays the fallback) and a concrete Phase 1 definition of done. Linked from docs/segmentation-learned-backend.md. Docs only.

…both docs Baked the web-check findings into the scope doc and the handoff prompt so the next session starts from current facts instead of re-searching: - New "Model availability (checked June 2026)" section in docs/segmentation-learned-backend.md: MeshSegNet has MIT-licensed CODE and ships pretrained PyTorch weights (upper+lower), but there is NO ONNX export (PyTorch only; GLM layers take adjacency matrices, so export is fiddly), the WEIGHTS' license is undocumented (private clinical training data), and it needs real per-cell preprocessing (<=10k cells, 15-dim features, per arch, 15 classes -> FDI). Teeth3DS+/3DTeethSeg'22 is CC BY-NC-ND 4.0 (non-commercial, no derivatives) -> not usable for a reusable build. Cited sources inline. - Candidate approaches now lead with "export the MIT MeshSegNet weights to ONNX" (gated on license clearance), heuristic stays the fallback, weights stay user-supplied (never committed). - Data & licensing and Estimate sections updated; estimate split into Phase 1 (no model, ~1-2 days) vs the model spike (~2-4 days, license-gated). - Handoff prompt: replaced the "confirm if an ONNX export is available" line with the confirmed availability + licensing facts and an explicit "Phase 1 assumes no weights / no torch; model export is a separate spike". Docs only.

…ack + ml-seg extra + crown-compactness metric) Optional on-device ONNX segmenter dropping into load_local_segmenter() behind an install/weights check, with the heuristic as the always-on fallback. No model, no torch at runtime; weights are user-supplied via $OPENSOURCE_ORTHO_SEG_WEIGHTS and never committed. Adds the ml-seg extra, a crown-compactness measurement metric + lab case, and tests for loader preference/fallback and the label->FDI contract. Removes the now-obsolete Phase 1 kickoff handoff prompt.

…l context, drop Odysseus) - Collapsing the chat now slides it off-screen as a pop-out drawer with a fixed reopen tab, instead of squashing the panel into a vertical bar. - Replace the AI Basic/Advanced toggle + free-text model + provider select with a single model dropdown; each option carries its provider (Local helper, GPT-5.5, GPT-5.4, Claude Opus 4.8, Claude Sonnet 4.7, open-source endpoint). - Remove the context-scope selector: the assistant always uses the full plan context; the egress-consent gate remains the sole control on external sharing. - Remove the Odysseus connector (kind, catalog, provider build) and its doc/UI references; open-source/self-hosted endpoints cover that use case. - Update OpenAI_Agents.md and ui/README.md for the new model-provider behavior.

…proximity map + scale) New orthoplan/occlusion/ package. The occlusal grid buckets two opposing arches into a shared xy grid of per-cell biting-surface heights; the signed clearance is the substrate the proximity overlay and these metrics share. register_bite trusts a real export's as-scanned bite (identity) when the arches already occlude, and falls back to a clearly-flagged approximate alignment otherwise. Adds a synthetic opposing-arch fixture, an occlusion-registration-accuracy lab case, 7 unit tests (incl. the real bundled scans), and docs/occlusion-registration.md. No runtime deps, no API/UI change yet.

Add ReviewTier model (STL_ONLY / ENHANCED_RECORDS / CBCT_ATTACHED / ROOT_BONE_AWARE) as the shared classification of a plan's evidence base. Root/bone-aware is fail-closed: DataAvailability flags or a bare CBCT attachment can never promote past CBCT_ATTACHED until registration and reviewed anatomy exist (Phases 5-7). Persist a queryable CaseProvenance digest (scan provenance, units, arch, modality, file ids, engine version, review tier) on each PlanVersion and surface it in case_api list/version payloads. Add review_tier to the evaluate output and a review-tier banner to the browser UI. Completes Phase 1 intake items. Tests: pytest tests/test_review_tier.py tests/test_case_api.py tests/test_api.py tests/test_examples.py; node --test (ui)

Make the scan-scale gate a consistent cross-rule contract: the segmented crown collision check now defers with a NOTICE when scan units are unverified, mirroring the existing movement-cap gate, so no millimeter finding is reported on untrusted scale. Add review tier and an explicit unresolved-anatomy gap list (roots, alveolar bone, periodontal status, occlusion, CBCT anatomy) to the handoff report. The list is fail-closed: every blind domain is named unless the plan reaches root/bone-aware review. The remaining Phase 2 items (auto-segmentation via /api/segment, the segmentation review UI, per-tooth fragment rendering, segmented-crown movement, and the shared-engine checks) were already in place; this commit closes the scale-gate and report items. Tests: pytest tests/test_collision_optimizer_cases.py tests/test_reporting.py tests/test_review_tier.py

Stage print exports now transform actual per-tooth fragment vertices for reviewed segmentation links whose meshes resolve in the workspace. Every other tooth (unreviewed link, or unresolvable/missing geometry) falls closed to a clearly-labeled schematic proxy box - a SegmentedToothMesh gains a 'reviewed' flag that gates real-vertex use. Bump the manifest to v2: add a hashes block (plan, stage frames, findings, original scans, segmentation fragments) alongside the existing per-artifact hashes, plus the review tier label and a uses_real_mesh_geometry flag. The print-package payload echoes the tier and the real-geometry flag. Split STL geometry generation into orthoplan/print_stl.py to keep both modules under the maintainability cap (geometry vs packaging). Tests: pytest tests/test_printing.py (real-mesh, fail-closed x2, v2 manifest)

Add an optional 'dicom' extra (pydicom). New dicom_intake module parses ONLY structural study metadata (modality, voxel spacing, dimensions, orientation, study date) with stop_before_pixels; patient identifiers are never copied (PHI_TAGS_EXCLUDED) and volume bytes never enter plan JSON. Intake fails closed when the extra is absent. CaseRecord gains an optional redacted DicomMetadata; record_workspace parses it for cbct/dicom records. Add a fail-closed CBCT lifecycle status (unavailable/attached/registered/anatomy-reviewed) and a 3D Slicer handoff path (cbct_handoff) surfaced in the evaluate output and a CBCT panel in the browser UI. Lock the invariant that a CBCT attachment does not change movement generation. Tests: pytest tests/test_dicom_intake.py (PHI redaction, fail-closed, status, handoff, generation invariance); node --test (ui)

Add RegistrationTransform (source STL asset, target CBCT record, 4x4 affine matrix, method, operator/model provenance, RegistrationQuality, notes) with cross-reference validation on the plan: a registration can only point at a real mesh asset and a real CBCT/DICOM record. Acceptance is fail-closed (is_acceptable = accepted AND quality present). registration_ready/accepted_registration gate CBCT-derived behavior; cbct_status reports REGISTERED only when ready; root/bone-aware still requires reviewed anatomy on top. Manual and imported transforms work today; an Open3D ICP experiment (registration_auto) is gated behind the mesh-processing extra and fails closed when absent. Expose registration + quality in the evaluate output and the browser CBCT panel. Tests: pytest tests/test_registration.py (matrix validation, fail-closed acceptance, cross-ref rejection, gating, Open3D-absent path)

Add provenance-bound, fail-closed derived-anatomy models: RootGeometry (per-tooth root mesh and/or centerline), ToothAxis (trusted long axis), and AlveolarBoneRecord, all carrying source CBCT record, registration id, model/operator provenance, confidence, and a ReviewStatus. An object is 'trusted' only when accepted/corrected AND in field; proposed, rejected, uncertain, or out-of-field anatomy is never trusted. Plan gains a derived_anatomy container with reference validation (every object must trace to a real CBCT record, registration, and mesh). root_bone_aware_ready now requires registration_ready AND a trusted object, so root/bone-aware review (and the ROOT_BONE_AWARE tier / anatomy-reviewed CBCT status) only unlocks behind real reviewed anatomy. Surface per-object trust flags in the evaluate output and add a browser review panel with accept/correct/reject controls that re-evaluate. Tests: pytest tests/test_anatomy.py (trust logic, reference rejection, fail-closed tiers, per-object trust flags)

Add a deterministic root/bone-aware rule that runs only behind trusted CBCT-derived anatomy: root proximity and inter-root collision (on reviewed root centerlines, after planned movement), cortical-boundary proximity (against reviewed alveolar bone bounds), and root/bone context for tip/torque/intrusion/extrusion/expansion movements on teeth with reviewed root/axis anatomy. Fail-closed: when a CBCT is attached but registration, segmentation, or reviewed anatomy is insufficient, it emits 'cannot assess' notices rather than guessing; STL-only plans stay silent (the data-gap layer already reports CBCT unavailable). The structured review verdict is limited to CONSISTENT, ISSUES, NOT_APPLICABLE and is surfaced in the evaluate output. Tests: pytest tests/test_root_bone.py (fixture geometry: not-applicable, cannot-assess, proximity ISSUES, cortical breach, movement context)

Add a browser-generated stored-review export endpoint for mobile handoff, including review tier, data gaps, CBCT/root-bone status, edit-lock metadata, handoff URLs, and review hashes. Harden the export contract by percent-encoding case IDs in links, accepting only http/https base URLs, making review_sha256 verifiable by excluding itself from the digest payload, and sanitizing browser download filenames. Wire the browser export action and document the mobile API contract. Add pure export tests plus live server route coverage for encoded handoff links. Verification: pytest; pytest tests/test_server.py; pytest tests/test_case_review.py tests/test_mobile_contract.py; npm test

Add a case-review export: a self-contained, opaque JSON document a mobile client stores as a read-only review. It carries the review tier, unresolved data gaps, finding counts, CBCT and root/bone verdicts, an explicit edit-lock (requires_browser_engine), and a content digest. Add a handoff descriptor (open URL / deep link / QR payload) for reopening the same local or hosted case on a device; base URLs are validated to http/https and ids are percent-encoded. Expose the builder via POST /api/case-review (server dispatch refactored into a helper) and a browser 'Export case review (mobile handoff)' action. Document the endpoint and the mobile display requirements (tier, gaps, edit-lock) in the mobile API contract. Tests: pytest tests/test_case_review.py tests/test_server_case_review.py

Add an orthoplan-case-review-v1 golden fixture and server-side golden regression coverage so mobile schema drift is caught. Teach iOS and Android to decode validated stored-review JSON, reject non-schema browser-review imports, display review tier/data gaps/edit-lock status, and expose browser/deep-link open paths from the handoff payload. Add a self-contained browser QR SVG renderer for case handoff payloads, render the QR after case-review export, and cover the renderer with UI unit tests. Verification: pytest; pytest tests/test_case_review.py tests/test_case_review_golden.py tests/test_mobile_contract.py; npm test; swift test; ANDROID_HOME=/Users/johnlaw/Library/Android/sdk gradle testDebugUnitTest; xcodebuild build -scheme OpenSourceOrthoLite -project mobile/ios/OpenSourceOrthoLite.xcodeproj -destination 'platform=iOS Simulator,name=iPhone 17,OS=26.5'

Add the manufacturing step that turns a stage model into a printable aligner shell. aligner_shell.py offsets the reviewed stage surface outward along vertex normals by the sheet thickness and closes it into a watertight solid (outer offset + reversed inner cavity + stitched rim), with an optional gingival trim plane. print_aligner.py emits a per-stage shell STL from real reviewed geometry only (never proxy teeth); the gingival trim is derived from trusted CBCT tooth axes and is fail-closed (no trim when the occlusal direction is unknown). Print settings gain aligner_shell_enabled, sheet_thickness_mm, and gingival_trim_margin_mm; the manifest records shell artifacts with thickness, watertight flag, trim status, and hashes. Browser print panel gets the shell toggle + thickness/trim inputs. This is geometry generation, not a clinical claim: printing, fit, and physical use remain the user's own responsibility and risk. A robust mesh-library offset/boolean path is left as a future enhancement over the pure-Python vertex-normal approximation. Rewrote TODO.md: condensed the completed v1.2 phases and added the effectiveness roadmap (phases 9-14) targeting each honest-rating track to >= 7/10. Tests: pytest tests/test_aligner_shell.py tests/test_printing.py

Use trusted reviewed root geometry and tooth axes to rotate real per-tooth mesh vertices about a root-derived center of resistance. Expose the movement model in evaluation output while failing closed to the existing crown-centroid visualization path when trusted anatomy is unavailable. Add regression coverage for unchanged no-root movement output and root-apex opposite motion under tipping.

Store capped representative surface samples on segmented tooth links and populate them from local segmentation. Use bbox prefiltering plus transformed adjacent same-arch sample distances to report contact and estimated IPR, with scale gating and a labeled bbox fallback when samples are unavailable. Add overlap, clear-pair, fallback, and segmentation-fragment coverage.

Add a synthetic validation benchmark report covering segmentation Dice/IoU, movement millimeter error, collision/IPR precision-recall, and shell thickness error. Expose the report through the validation package and a validation-benchmark CLI command with JSON output. Keep benchmark metrics as tracked reported numbers rather than pass/fail gates, with caveats for synthetic fixtures and future reviewed open datasets.

Clean shell input geometry by welding near-duplicate vertices, dropping degenerate triangles, and failing closed when no usable reviewed surface remains. Expand shell QA with thickness distribution, watertightness, connected components, cleanup counts, shell hashes, and per-stage fail-closed reports. Surface manufacturing-readiness verdicts in API/export status and print-package manifests using CONSISTENT, ISSUES, and NOT_APPLICABLE vocabulary.

Downgrade Track 1 to an honest ~7/10 and add docs/application maturity.md to define the three tracked maturity surfaces, current scores, gaps, and 10/10 criteria. Extend shell manufacturing QA with printer tolerance settings, rim closure, approximate self-intersection signals, inner/outer clearance, sliver reporting, API shell-QA readiness findings, and print-package manufacturing summaries. Refresh stale CBCT/root-bone documentation, regenerate examples and the mobile case-review golden fixture, and add coverage for inverted winding, disconnected islands, skinny triangles, bad geometry, and package-payload QA summary.

Track 1 (upload -> printable aligner artifacts) accuracy + surfacing. Accuracy: - build_aligner_shell now bakes printer XY/Z dimensional compensation into the exported shell geometry instead of only reporting it in the manifest. The bias is applied along vertex normals (XY gain in-plane, Z gain on the build axis) to BOTH the inner cavity and outer surfaces, so it shifts the part's outer dimensions to cancel printer over/under-cure without changing wall thickness. Applied values are recorded in ShellStats and echoed back through the shell QA block. Previously the manifest advertised a compensation profile the STL did not contain. - solid_stl now writes the real unit outward facet normal per triangle, computed from winding (right-hand rule), instead of a placeholder "0 0 0". Degenerate facets still emit 0 0 0. Applies to both stage-model and shell exports, keeping the export spec-correct for strict CAD/validation tools. UI surfacing: - The guided print step and the technician print panel now show the manufacturing-readiness verdict, the applied printer compensation, and a per-stage shell QA view (watertight / thickness range / self-intersection signals / skip reason). The backend already computed and returned this data; it was previously invisible to the user. Tests: - Shell compensation defaults/Z-shift/wall-thickness-preservation, manifest compensation reporting, real STL facet normals, and DOM-free UI rendering of the print QA panel. Docs: application maturity Track 1 "what exists" updated; TODO status note added.

…, oracle Raises Track 1 (upload -> printable aligner artifacts) to ~8/10 by removing three of the gating limitations: approximate self-intersection, opaque verdicts, and no independent known-good comparison. Self-intersection + nonmanifold engine: - New orthoplan/mesh_intersect.py implements a deterministic Möller (1997) triangle-triangle intersection test, pure-Python and dependency-free. - count_self_intersections now uses it as an exact narrow phase behind the existing AABB broad phase (made inclusive so a true crossing on a shared boundary plane, or a triangle lying in a coordinate plane, is not pre-rejected). This replaces the box-overlap approximation that both over-counted and under-proved. - Added count_nonmanifold_edges and a nonmanifold_edge_count shell stat (edges shared by more than two faces). Per-artifact QA explanations: - print_aligner._failed_checks produces a named list of exactly which deterministic checks downgraded a shell to ISSUES (watertight, nonmanifold, disconnected pieces, rim closure, self-intersection at zero tolerance, thin walls, inner/outer clearance). With a real engine, the self-intersection tolerance is now zero rather than the approximation's allowance. The verdict is derived from this list, and the manifest + guided/technician print UI surface the named reasons. Verification (independent ground truth + messy corpus): - tests/test_mesh_intersect.py: engine cases (piercing, separated, parallel, coplanar overlap/disjoint) and the count functions. - tests/test_shell_quality_engine.py: a closed-form slab-volume oracle (math, not the builder) confirming the flat-quad shell encloses area*thickness; flat-input compensation is a pure translation; and synthetic messy fixtures whose defects the QA must name. - ui/print_qa.test.js: failed_checks rendering. Docs: application maturity Track 1 rated ~8/10 with updated gaps; TODO snapshot and status updated.

… fallback) Starts the real 8->9 lever for Track 1 (a robust mesh backend) with a safe, mergeable first slice: backend selection, an optional Open3D repair path, and a fail-closed fallback. The rating is intentionally NOT bumped to 9 - the robust offset is repair-only and unvalidated in CI (Open3D is not installed), exactly like the existing automatic-registration experiment. Backend: - New orthoplan/aligner_shell_robust.py: optional Open3D mesh-repair shell backend behind the mesh-processing extra (merge near-duplicate vertices, drop degenerate triangles, remove non-manifold edges, orient consistently, recompute robust normals), then reuse the shared offset. Mirrors registration_auto.py: guarded import, robust_shell_available() probe, RobustShellUnavailable, and a pragma-no-cover heavy-import body. - settings.shell_backend ("pure-python" default | "robust"). When robust is requested but Open3D is missing, the export falls back to pure-Python and records the downgrade (fallback_reason) - it never silently changes geometry. - resolve_shell_backend() identity (requested/used/available/fallback_reason) flows into the manifest (aligner_shells.backend), PrintPackageResult, the API response (aligner_shell_backend), and the guided/technician print QA UI. Refactor (single source of truth, maintainability caps): - Extracted assemble_shell() in aligner_shell.py so both backends share the offset, rim stitching, printer compensation, and QA block. - Moved mesh indexing/topology helpers into new aligner_shell_topology.py to keep aligner_shell.py under the 300-line file cap. Tests + fixtures: - tests/test_shell_backend.py: default backend, resolution, fail-closed fallback when Open3D is absent, and the manifest recording the downgrade; plus a UI test for the backend line. - Regenerated mobile/fixtures/case-review-v1.json and examples/*.json for the new shell_backend settings field. Docs: docs/aligner-shell-backend.md (status, contract, fail-closed behavior, what a validated 9/10 needs); maturity Track 1 + TODO Phase 9 follow-up updated.

Roadmap-only update (no code change). - Added an ordered "Path to Track 1 ~9/10" section sequencing the remaining work: 9.1 scale the pure-Python shell QA -> 9.2 true boolean/SDF (Minkowski) offset in the robust backend -> 9.3 install Open3D in a test env and validate the robust backend vs pure-Python QA on a messy corpus (the actual ~8 -> ~9 move) -> 9.4 full-arch known-good fixtures from an independent mesh pipeline. Noted that a 10/10 is intentionally off-path: it would require material/thermoforming/fit/ printer-calibration/physical validation that this safety-boundary-first toolkit deliberately does not model. - Added Phase 9.1 (PRIORITY) for the spatial-grid fix, with the reason it is a priority: the real triangle-triangle self-intersection engine and the min_inner_outer_clearance check are O(n^2)/O(V^2) and run on every pure-Python shell build (measured ~16.7s at ~8,460 shell triangles). Because the pure-Python path is the always-on default, this must land before the shell QA is run on real multi-tooth reviewed plans, or per-stage builds will hang. Tasks: spatial-grid broad phase for self-intersection, spatial nearest-neighbor for clearance, a wall-clock perf regression test, and identical results on existing fixtures. - Relabeled the robust-backend follow-ups as Phase 9.2-9.4 to match the path.

Roadmap-only update (no code change). Captures the current clunky state of the in-app assistant, grounded in the code: - single-turn with no memory (answer_chat_payload sends only [user, assistant]; no prior turns are threaded back), so a real back-and-forth is impossible; - no streaming (full answer appears at once after a static status line); - one combined provider+model <select> with a hardcoded option list and models "configured externally" (no per-provider model selection); - renderChat rebuilds chatMessages.innerHTML every render, so scroll/focus churn and no incremental append. Adds goals in two groups plus safety constraints: - Goal A (conversational flow): thread bounded conversation history, incremental append with preserved scroll/auto-scroll/focus, pending indicator + Enter-to-send/Shift+Enter, and token streaming with a non-streaming fallback. - Goal B (Cursor-style selection): split into provider then model, give each connector a real selectable model list (plus free-text for self-hosted), surface key/PHI affordances, keep the local helper as the default no-key option. - Safety (unchanged): keep model output separate from deterministic findings with the lint_finding() gate, per-request credentials never stored, and the PHI-share acknowledgement + shares_patient_data labeling before any non-local provider receives plan context.

Roadmap/docs-only update (no code change). Makes the docs consistent with the directive that every focus area is a committed ≥9/10 target - previously only Track 1 had a documented target and ordered path. application maturity.md: - Summary table gains a Target column; all surfaces marked ≥9/10. - Added per-track "Target: ≥9/10" lines pointing at the TODO ordered paths. - Reframed each "What 10/10 would require" list as "What reaching the ≥9/10 target requires"; reaffirmed in the intro that 10/10 is intentionally NOT a target for the geometry tracks (no material/fit/physical-use modeling). - Added Track 4: In-App AI Assistant (Chat) as a tracked surface (~4/10 -> ≥9/10) with what-exists, why-not-higher, and the path requirements. TODO.md: - Honest-effectiveness snapshot gains a Target column and a chat row. - Replaced the single "Path to Track 1 ~9/10" with a "Targets: all four surfaces to ≥9/10" section containing ordered paths for Tracks 1-4, each referencing the existing phases. - Added Phase 16 (full triangle-level collision/IPR for Track 2) so the Track 2 path reference resolves.

Roadmap/docs-only update (no code change). TODO.md: - Replaced the four separate per-track "Path to ~9/10" blocks with one global "Order of operations to ≥9/10" section, grouped into dependency waves: Wave 0 Phase 9.1 (shell QA perf - unblocks real arch use, do first) Wave 1 Phase 15 (chat) + Phase 13 (benchmark corpus) - independent, parallel Wave 2 Open3D test env -> 9.2/9.3 (robust offset + validation) -> 16 -> 9.4 Wave 3 Phase 14 (learned segmentation, benchmarked) Wave 4 Phase 12a -> 12b -> 12c (CBCT raw volume; longest road) - Reordered the detailed phase sections to match that execution order (Phase 15 now precedes Phase 16) and tagged each section with its wave. - Made the shared Open3D test-environment prerequisite explicit in Wave 2. Removed stale info: - The "current status" QA bullet no longer describes self-intersection as "approximate signals" - the real triangle-triangle engine replaced it. - application maturity.md cross-references now point at the new "Order of operations" section instead of the deleted per-track "Path to" blocks.

Update TODO.md to remove stale status, mark completed shell QA, benchmark, chat, and initial triangle-contact work, and document the remaining streaming/Open3D/full-geometry gaps. Scale pure-Python aligner shell QA by replacing quadratic self-intersection broad phase and inner/outer clearance scans with spatial-grid based exact checks. Add parity and full-arch-scale performance coverage. Expand validation benchmarks with reviewed non-PHI corpus metadata, baseline deltas, messy shell metrics, and sampled-vs-triangle collision distance metrics. Split benchmark models and helper registries into focused modules to keep maintainability guardrails green. Improve plan AI chat with bounded conversation history, incremental message rendering, provider/model split, per-provider model memory, connector model catalogs, custom model IDs, and request-scoped credential/share handling. Add an optional in-memory triangle-level collision/IPR distance path while preserving the sampled and bbox fallback behavior, with tests comparing sampled and triangle contact distances. Tests: tools/check_maintainability.py --strict; pytest (474 passed, 1 skipped); ui npm test (69 passed).

Add reviewed mesh workspace triangle extraction for collision/IPR evaluation so reviewed per-tooth STL assets are loaded from the local mesh registry without serializing geometry into plan JSON. Keep unreviewed, missing, or invalid mesh links fail-closed on the existing sampled/bbox fallback. Add an SSE chat streaming endpoint and UI stream consumer for connectors that advertise streaming, while retaining the existing JSON chat fallback for local and non-streaming providers. Refresh TODO.md to mark Phase 16 and the Phase 15 streaming remainder complete, prune stale roadmap text, and update effectiveness/status estimates. Verification: pytest (478 passed, 1 skipped); cd ui && npm test (69 passed); tools/check_maintainability.py --strict (passed).

Implement the next Phase 9 wave for printable aligner shells by splitting shared shell assembly from backend-specific surface generation, upgrading the optional Open3D robust backend from repair-only behavior to repair plus distance-field offset correction, and adding synthetic messy/full-arch validation metrics for robust-vs-pure shell QA. Wire the validation metrics into the benchmark report and add tests for both no-extra skip behavior and Open3D-enabled validation cases. Add a dedicated mesh-processing CI lane so the optional robust path is exercised without changing the default no-extra install. Clean TODO.md and application maturity docs so completed Phase 9/15/16 work is summarized once, remaining work is limited to Phase 14 segmentation maturity and Phase 12 CBCT/root-bone automation, and current maturity scores reflect the implemented shell, collision, and chat streaming work. Verification: python3 -m pytest tests/test_shell_backend.py tests/test_validation_benchmarks.py tests/test_aligner_shell.py -q; python3 tools/check_maintainability.py --strict; python3 -m pytest -q (with local loopback socket permission).

Guided "Review your plan" step now leads with an at-a-glance dashboard: a verdict hero (ready / needs-review / cannot-assess) plus edit-diff, warnings, root/bone, and print-readiness cards and 3D overlay chips, backed by the new guidedReviewDashboard() in ui/guided.js. Post-implementation audit fixes (found while reviewing the change): - Severity classification: the dashboard counted EVERY finding as a blocking warning. Findings carry severity info|notice|warning, and the rule engine emits an `info` root-bone-context finding for healthy root/bone-aware plans, so a clean plan wrongly read "Needs review — 1 warning(s)". Only warning-severity findings now drive the verdict, summary, and warnings card; unknown/missing severity is treated as a warning (fail-safe surfacing). - Overlay chips: highlights matched code.includes("movement-cap"/ "collision"), which also matched the *-scale-unconfirmed NOTICE codes (check skipped, not violated), producing phantom overlay chips. Highlights are now derived from warning-severity findings only. Tests: realistic fixtures (findings carry severity) plus regressions proving an info finding stays "ready" and skipped-check notices emit no overlay chips. Full suite green (75 UI, 489 Python, maintainability). Docs: README hero image swapped to the intraoral arches photo; application maturity (Track 2) and TODO current-status updated to describe the severity-aware guided review dashboard.

john-lawniczak added 30 commits June 8, 2026 13:21

Make plan AI available across workflow

e8ddbcb

john-lawniczak added 29 commits June 11, 2026 13:28

Complete roadmap and guided edit UX

c5b5219

Improve CBCT volume proposal quality

2cdecd8

john-lawniczak merged commit 450e8d4 into main Jun 12, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/v1.2#2

Feat/v1.2#2
john-lawniczak merged 78 commits into
mainfrom
feat/v1.2

john-lawniczak commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

john-lawniczak commented Jun 9, 2026

Type

Summary

How it was tested

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant