Running log of design decisions, pivots, and open questions for the Tele Assistance System case study. Append only; newest entry on top.
Four cross-cutting changes landed today. All four behind a single 561-passing pytest suite.
src/methods/experimental.py shrunk 1501 → 326 lines (thin facade). Body moved into the procedure/ package per the design intent the package's own __init__.py docstring spelled out at stage 0 (deployment + tuning + experiment triad). Mechanical refactor, zero behaviour change.
- src/experimental/procedure/experiment.py — filled (was empty):
run_experiment+ 19 helpers (profile extractors, mesh-spec builders, open-loop trial driver, controller polling, bounds gate). 1019 lines. - src/experimental/procedure/tuning.py — new:
run_calibration+_BringUpFactory+_run_workers_scaling+find_latest_envelope. 247 lines. - src/experimental/procedure/init.py — re-exports
run_calibration,run_experiment,find_latest_envelope; docstring updated to match the new triad. - src/methods/experimental.py — thin facade:
run(stage)dispatcher +main()+ CLI printers + re-exports of every private symbol the test harness reaches into (_admission_lt_from_profile,_resolve_admission,_build_mesh_admission,_dispatch_at_rate,_consume_payloads,_drive_trial,_check_breach,User, etc.).
Back-compat strategy: the facade re-exports private symbols so existing import paths keep working; tests migrate from attribute-form monkeypatch.setattr(experimental, "User", _FakeUser) to string-path form monkeypatch.setattr("src.experimental.procedure.experiment.User", _FakeUser). Patches now hit the lookup site instead of the facade's alias.
_maybe_check_bounds (in procedure/experiment.py) imports find_latest_envelope lazily from procedure/tuning inside the function body — keeps experiment.py free of an outgoing import-time dependency on tuning.py.
R3 (cost minimisation, conditional on R1 ∧ R2) had threshold: null and was a derived flag (R3.pass = R1.pass and R2.pass); the user wanted to focus the case-study on the two thresholded requirements. Stripped from:
data/reference/baseline.json— only R1 + R2 remain inrequirements.src/analytic/metrics.py—check_reqssignature drops thecostkwarg; verdict dict{R1, R2, R3}→{R1, R2}.src/analytic/__init__.py,src/methods/{analytic,stochastic,dimensional}.py,src/io/config.py,src/experimental/prototype/target/service/qos.py— narrative + docstring touch-ups.- Notebooks 01 / 02 / 03 / 05 — intro inputs/outputs, summary table column, verdict section title, verdict table column, summary narrative all R3-free.
tests/analytic/test_metrics.py— removedtest_cost_recorded_but_not_thresholded; renamedtest_r3_threshold_is_null_in_reference→test_reference_has_only_r1_r2.tests/methods/test_analytic.py+test_stochastic.pyshape assertions updated.
New feature anchoring each adaptation's (theta, sigma, eta, phi) dimensional verdict onto 04-yoly.ipynb's swept design space.
New API:
- src/dimensional/reshape.py::load_dim_op_points — reads saved
data/results/dimensional/<adp>/<profile>.jsonand aggregates each adp's per-artifact coefficients viacoefs_to_net(mean reducer; matches03-dimensional.ipynb §2). - src/dimensional/reshape.py::load_dim_op_points_per_node — per-artifact variant for the per-node arts_behaviour overlay.
- src/view/charter.py::plot_yoly_with_op_points — wraps
plot_yoly_chart(kind="chart") orplot_yoly_space(kind="space"). X markers (s=180, black-edged, sentence-case bold labels with white α=0.6 bbox) + dim-grey dashed trajectory line connect baseline → s1 → s2 → aggregate. - src/view/charter.py::plot_yoly_arts_with_op_points — per-node 3D variant.
legend_ncol_capthreaded throughplot_yoly_chart/plot_yoly_space; the wrapper extends the footer legend with one X-marker proxy per adaptation (suffix(op), same colour as the cloud).
Critical bug solved (worth pinning): the yoly sweep painter assigns scenario colours by sorted(paths.keys()), not insertion order. With scenarios={"No Adaptation": ..., "S1: Retry": ..., "S2: Select-Reliable": ..., "S1 & S2": ...} the alphabetical sort puts "S1 & S2" BEFORE "S1: Retry" (space 0x20 < colon 0x3A), so a naive wrapper iterating op_points in insertion order assigns the wrong colour to every adp except baseline. _resolve_op_point_colors(adps, scenarios, paths) mirrors the sweep's sort + zip-by-position to fix this.
Notebook restructure (04-yoly.ipynb) — final 6-section layout:
- §1: Sweep all 4 adaptations (single loop).
- §2: Per-adp arch chart + space (loop).
- §3: Per-node grids (baseline + aggregate only — keeps figure count manageable).
- §4: 4-way cmp arch overlay with op-points.
- §5: 2-way baseline-vs-aggregate cmp overlay with op-points.
- §6: Per-node 3D arts_behaviour with op-points.
Legacy plot_node_heatmap(ndss, names, nodes, ...) filtered each panel's rows by _df[cname].isin(nodes) — so when adaptations swapped services at the same row position (baseline MAS_{3} ↔ s2 MAS_{4}), the s2 panel silently dropped the swap-slot rows. Confirmed in data/img/analytic/s2/heatmap_vs_baseline.png (baseline = 13 rows, s2 = 10).
Refactor: dropped the nodes parameter; each panel renders max(len(df) for df in ndss) rows; shorter panels NaN-pad; per-row cname value comes from _df.iloc[i][cname] so each panel's y-axis labels come from its own column. Swap-slot rows now show MAS_{3} on baseline panel and MAS_{4} on s2 panel at the same row position. Suppressed seaborn's "key" y-axis title via _ax.set_ylabel(""). plot_node_diffmap gained a y_labels kwarg for per-row label override.
All four notebooks (01-analytic, 02-stochastic, 03-dimensional, 05-experimental) dropped their _slot workaround. New tests/view/test_diagrams.py (5 tests) pins the positional-alignment + per-panel-labels contracts.
| Stage | Count | Delta |
|---|---|---|
| Pre-session | 550 | — |
| After plotter fix + notebooks | 555 | +5 (new test_diagrams.py) |
| After R3 stripped | 554 | -1 (deleted test_cost_recorded_but_not_thresholded) |
| After yoly overlay | 561 | +7 (new test_charter_overlay.py) |
| After stage 8.0 extraction | 561 | ±0 (zero regression, monkeypatches migrated) |
Open: stage 8.1 (remote deployment) sits behind the now-clean facade. data/config/method/prototype/hosts.json + starter daemon + placement-aware bring_up_mesh is the next chunk. Plan at C:\Users\Felipe\.claude\plans\lets-go-to-plan-velvet-meteor.md.
2026-05-12 — Stage 7 closed: thin notebook + observed aggregator + multi-worker mesh + Flask target + open-loop driver
Decision. Close stage 7 of the src/experimental/ rebuild. The 16-grid notebook 05-experimental.ipynb drives (adp x framework x granularity) end-to-end under dpl="multiprocess" and reuses the analytic plotters per (framework, granularity) slice. verdict.json::mesh echoes the per-atomic (c, K, mu, eps) actually applied so stage 9 can verify cross-method parity. 550 pytest green.
Planned scope landed (c/K + aggregator + notebook):
_admission_lt_from_profile(adp)+_resolve_admission(...)lift per-svc(c, K)fromprofile.specs[svc_id]; threaded through_build_mesh_specsso each atomic spawns with its profile-declared caps._build_mesh_admission(...)+compute_verdict(mesh_admission=...)echo{svc_id: {c, K, mu, eps}}intoverdict.json::mesh(TAS_{1} composite + TAS_{2..6} internal stages when expanded).observed_nodes_from_run(..., composite_op=...)aggregates per-pid CSVs into an analytic-shaped nodes DataFrame; composite TAS_{1} synthesised fromverdict.operationalsince the composite writes flow JSONL not per-pid CSV.05-experimental.ipynb: 16-row summary / verdict tables, plots per(framework, granularity)slice. R3 dropped (cost is the comparison method's job).
Unplanned (multi-worker mesh):
- New
\w_{<svc>}PACS Variable inprofile/{dflt,opti}.json::specs(defaultw=2; triedw=4first but laptop CPU thrashed during cold-start storm).ArtifactSpec.w+_workers_lt_from_profile(adp)lift the map. MeshSpec.workers: int = 1;bring_up_meshspawns N processes per spec on consecutive ports and yieldsdict[svc_id, list[str]]._find_contiguous_blockpicks an N-port window.ServiceDescription.urls: tuple[str, ...](parallel to legacyendpoint);ServiceClient._pick_urlround-robins per service via a per-instance counter.pick_free_portfloors atMIN_USER_PORT = 8000(out of system + registered port ranges) + auto-rotates so TIME_WAIT on Windows doesn't block back-to-back iterations.target.json::ready_timeout_s: 20.0 -> 90.0(cold-start storm) +controller.ready_timeout_s: 5.0 -> 20.0.
Unplanned (open-loop trial driver):
The closed-loop "one User per TAS_{1} worker" pattern saturated the apparatus at ~33 req/s. At the case-study design point lambda_z = 345 req/s this is a 10x undershoot that would look like a slow trial rather than a verdict-level operational finding. Replaced with an open-loop producer / consumer over asyncio.Queue:
_dispatch_at_rate(queue, n_requests, rate, stop_event, ...)pushes ticks atrequest_rate_per_sregardless of server speed; drift-corrected pacing viastart + i/ratetargets._consume_payloads(consumer_id, base_url, queue, ...)consumers each own one User + httpx.AsyncClient; exit onNonesentinel orstop_event._drive_trial(tas_urls, ..., consumer_pool_size=64, max_queue_depth=1000, drain_timeout_s=60.0)spawns N consumers + 1 dispatcher; drains viaasyncio.gather(timeout=drain_timeout_s).consumer_pool_size = 64kept below httpx's defaultmax_connections = 100to avoid pool throttling.max_queue_depth = 1000is the back-pressure ceiling: when consumers stall,queue.put()blocks and the offered rate slips belowrequest_rate_per_s— which surfaces as a realX_0_req_per_sundershoot in the verdict.
Baseline stop predicate switched from R1 AND R2 to R1 OR R2 (_STOP_PREDICATES["baseline"] = ("r1_breach", "r2_breach")). The intent: in baseline we want to stop on ANY single-requirement breach (this is the "no adaptation" reference; if EITHER requirement is breached the run is already a failure). Per-adp: s1 -> ("r1_breach",), s2 -> ("r2_breach",), aggregate -> ("r1_breach", "r2_breach").
Unplanned (apparatus mu = ideal mu). profile/{dflt,opti}.json::specs[<svc>]::\mu_{<svc>}._setpoint lifted to match artifacts[<svc>]::\mu_{<svc>}._setpoint (case-study ideal). Specs no longer declares a fractional apparatus floor; the experiment now runs over the same service rates the analytic / dim / stoch methods compute over. Values: TAS_{1..6}=700, MAS_{1}=180, MAS_{2}=530, MAS_{3}=150, MAS_{4}=880, AS_{1}=700, AS_{2}=410, AS_{3}=1580, AS_{4}=210, DS_{1}=250, DS_{3}=550. NB: s2 swap AS_{3}->AS_{4} has mu_AS4 = 210 < mu_AS3 = 1580 (classic throughput-for-reliability trade); MAS_{3}->MAS_{4} is a pure upgrade.
Unplanned (Flask target + controller). ABC route bases land per stage: AtomicRoutesBase (third_party.py), InternalStageRoutesBase (internal_stage.py), TasRoutesBase (tas.py), ControllerRoutesBase (controller/app.py). FastAPI + Flask subclasses each. New build_*_flask_app builders. AsyncLoopThread (target/factory/async_bridge.py) hosts a daemon-thread asyncio loop so waitress worker threads share K + c gating + httpx client. SyncSamplePoller (controller/poller.py) is the threading-based mirror of SamplePoller for the Flask controller.
Unplanned (notebook polish). Collapsed mode uses an empirical star routing (TAS_{1} -> atomic_i weighted by observed lambda) since the profile routing matrix disconnects TAS_{1} from atomics when TAS_{2..6} are absent. Heatmap / diffmap rows are union-padded across all 4 adps per (fw, gr) slice so swap-slot scenarios (s2 swaps MAS_{3}->MAS_{4} etc.) don't drop rows. Nodes reorder to cfg.list_node_keys() workflow order rather than alphabetical. All variant runs land at data/results/experimental/<adp>_<framework>_<granularity>/ (single underscore between every segment; the variant suffix is always present so there is no bare canonical path).
Calibration override. Latest envelopes for localhost + multiprocess carry r_max_req_s = 350 (measured 328) with an _overrides block recording the manual value and rationale: the case-study lambda_z = 345 requires a bounds gate above the host's hardware ceiling so the trial actually runs; the undershoot then lands in verdict.operational.X_0_req_per_s as a real experimental finding.
Tests + verification. 550 pytest passing. 7 new open-loop driver tests (_dispatch_at_rate flood / pacing / stop / breach; _consume_payloads sentinel / records; _drive_trial drain timeout) all kept inside the single TestExperimental class (user feedback: "lets try to keep as few tests clases as possible"). At w=2: collapsed mesh = 17 procs (8 svcs x 2 + 1 controller), expanded = 27 (13 x 2 + 1).
Out of scope going forward. R1/R2 trajectory plotters over window.parquet (analytic / dim render only scalar snapshots; experimental follows). Multi-trial repetition. Multi-worker breach merging in the controller (polls tas_urls[0] only; verdict R1/R2 unaffected since they come from the merged flow JSONL). R3 in the experimental notebook.
Open. Stage 8 = dpl="remote" in procedure/deployment.py::_resolve_ports + cross-host spawning. Stage 9 = src/methods/comparison.py + 06-comparison.ipynb (yoly chart over verdict.json from all four methods).
Decision. Close stage 6 of the src/experimental/ rebuild. The experimental method now drives the four case-study adaptation strategies (baseline, S1 retry, S2 prefer-reliable, aggregate) against their own physical meshes, with a thin MAPE-K controller monitoring R1 / R2 in a rolling window and a verdict landing in data/results/experimental/<adp>/verdict.json. End-to-end demo (python -m tests.demo.experiment) produces distinct verdicts per adp.
Architecture (three subsystems spawned by run_experiment):
- Target system (TAS_1 + atomic mesh): unchanged dispatch path; new endpoints
GET /samples?since=<offset>(pull-style probe buffer) andPOST /config(install picker on live workflow engine).recent_samplesdeque onapp.stateholds the last 1024 records. - Controller (independent uvicorn process): runs the controller FastAPI app.
SamplePollerasyncio task pulls/sampleseverypoll_interval_msand feeds records into a rolling-window deque + unbounded history list. ExposesGET /aggregates(running R1 / R2 + breach flags) andGET /history(full trajectory forwindow.parquet). - Client: unchanged load generator.
Orchestrator polls /aggregates every orchestrator_poll_every_n requests and applies a strategy-specific stop predicate (baseline = R1 AND R2; s1 = R1; s2 = R2; aggregate = R1 OR R2). Trial halts on first breach post-warmup or after N requests.
Strategies (src/experimental/prototype/controller/strategies.py). Picker contract widened to (svc_kind, operation, catalogue) -> list[ServiceCatalogueEntry] because _routs is anchored per dispatching stage (so the same kind can have different weights at TAS_{3} vs TAS_{4}). Four classes:
| Picker | adp | First pick | Retry behaviour | Honours _routs? |
|---|---|---|---|---|
FirstOfKindPicker |
baseline | weighted-random per _routs[operation] |
none | yes |
RetryOnFailurePicker |
s1 | weighted-random | drop failed svc_id, renormalise, weighted-random over remaining; up to max_attempts |
yes |
PreferReliablePicker |
s2 | argmin observed failure rate | none | no (explicit override) |
RetryAndPreferReliablePicker |
aggregate | argmin observed failure rate | next-most-reliable; up to max_attempts |
no (explicit override) |
Reliability-aware pickers expose observe(svc_id, success); the engine calls it after every attempt to update the rolling window. extract_op_weights(routs, nodes, stage_routes, scenario=adp) builds the per-operation weight table from the active profile's routing matrix.
Per-adp mesh (the case study's S2 = service-instance swap). Each adp loads its own service set from _nodes: baseline + s1 share MAS_3 / AS_3 / DS_3; s2 + aggregate use MAS_4 / AS_4 / DS_1. The catalogue is the weyns_iftikhar_2016 15-service superset; _build_mesh_specs iterates catalogue.entries ∩ mu_lt.keys() so each adp only spawns its own 7 atomics. build_tas_fastapi_app filters the workflow engine's catalogue down to the spawned mesh so by_kind only returns reachable services.
Server-side ε injection (AtomicRoutes._maybe_inject_failure). Each atomic carries eps (from profile.specs[svc_id].epsilon) + failure_mix (from failure_modes.json). At the top of every request, the route draws random.random() against eps; if it lands inside, a mechanism is picked from the mix and stamped onto the payload. Client-set inject_failure always wins (the route only fills None). Stack-trace noise from the drop mechanism (the _aborted_body generator's RuntimeError("synthetic drop mid-stream")) is silenced by a logging.Filter on the uvicorn.error logger installed at failure.py module load.
R1 threshold corrected to 0.01 (Weyns & Calinescu 2015 SEAMS Step 1: "TAS service invocations that fail to complete successfully is less than 1 %"). The previous 0.0003 came from Cámara 2023 (Fig. 1(b): "0.03 %"), but Cámara's per-service failure rates are also in percent (column header Fail.rate (%)), so the threshold was 100× too strict for our profile values which are Weyns-style fractions (e.g., MAS_1 = 0.12 = 12 %). The project anchors to Weyns 2015 for the failure-rate scale.
Verdict = Denning & Buzen 1978 operational analysis over the flow JSONL: A (arrivals), C (completions), F (failures), T_s (observation window), X_0 (throughput), R_s (mean response time). R1 = F/A, R2 = R_s. Plus pass flags against thresholds + stop reason. window.parquet carries the per-sample running-R1/R2 trajectory for stage 7+ plots.
End-to-end demo (python -m tests.demo.experiment) at N=100:
| adp | R1 | R2_ms | stop_reason |
|---|---|---|---|
| baseline | 0.10 | 53.0 | n_reached |
| s1 retry | 0.04 | 245.4 | n_reached |
| s2 prefer-reliable | 0.00 | 42.5 | n_reached |
| aggregate (S1 + S2) | 0.00 | 136.5 | n_reached |
S1 halves baseline R1 (Weyns 2015 Table IV confirms this ratio); S2 + aggregate reach 0 failures on this small N. R2 climbs because retries / smart picks add per-request hops; the strategy comparison stays methodologically clean because R2 is the same axis for everyone.
Test surface. 386 tests pass across tests/experimental/ + tests/methods/ (was 348 before stage 6 started). 38 new tests in tests/experimental/prototype/controller/ (strategies + verdict + app + poller) + 2 new endpoint tests in tests/experimental/prototype/target/factory/test_tas.py + 4 new filter tests in tests/experimental/prototype/target/factory/test_failure.py.
Refactor pattern locked. TasRoutes, ControllerRoutes, AtomicRoutes all follow the same class-based-routes shape: module-scope class holds the per-app immutable settings, route methods are bound async methods. No nested defs inside factory functions; pickle-friendly across multiprocessing.spawn.
Next. Stage 7: experimental notebook + R1/R2 trajectory plotters reading window.parquet. Drives run_experiment across the four adps from 05-experimental.ipynb. Also: thread per-atomic c / K from profile.specs into the mesh (today they're null in target.json::atomic_admission) and echo the applied c/K into verdict.json, so the stage-9 yoly chart can verify analytic / dim / stoch / experimental all run over identical M/M/c/K parameters.
Permanently out of scope (don't re-open in later stages):
- MAPE-K Analyse + Plan phases. The case study only gives the four decisions (the adp labels); we execute them, we don't derive them. Controller stays at Monitor + Execute.
- Mid-trial picker swaps.
POST /configfires once at trial start. A live-reconfiguration demo would be a separate artefact, not part of the case-study replication.
Deferred (may revisit): multi-trial repetition with statistical envelopes around R1 / R2.
Decision. Add two new switches on top of stage 5 (target.json + CLI): target_granularity ∈ {collapsed, expanded} and inject_internal_stage_mu ∈ {false, true}. Default = (collapsed, false), byte-identical to the previous stage-5 surface. Expanded mode promotes the conceptual stages TAS_{2..6} into real uvicorn workers so per-stage queue length / response time / CSV become directly observable, at the cost of 5 extra HTTP hops per request and a ~13-spawner mesh.
Mesh. Collapsed = 8 spawners (1 composite TAS_1 + 7 third-parties). Expanded = 13 spawners (TAS_1 + 5 internal-stage atomics TAS_{2..6} + 7 third-parties). Stage mapping: TAS_{2} → medical_analysis (analyseData); TAS_{3} → alarm (triggerAlarm); TAS_{4} → alarm (sendAlarm); TAS_{5} → drug (changeDrug); TAS_{6} → drug (changeDose).
Class hierarchy stayed Fig.3-faithful. New subclass TasInternalAtomic(AtomicService) for TAS_{2..6}. AtomicService and CompositeService remain siblings under AbstractService; no reparenting; _handle contract unchanged. The composite never does work itself in either mode.
μ asymmetry. Third-party atomics (MAS / AS / DS) always sleep on μ. TAS framework atomics (TAS_{2..6}) only sleep when inject_internal_stage_mu=True. Default OFF so the extra HTTP hops show up cleanly in latency comparisons before adding μ-fidelity overhead.
Workflow loader + engine gained the svc_id step-form. Previously every step targeted a catalogue svc_kind (picker resolves to a concrete id). Steps may now alternatively carry svc_id for direct cache lookup (no picker). Exactly one of svc_kind / svc_id must be set; the loader rejects both / neither. Collapsed loads workflow/tas.json (svc_kind throughout); expanded loads a separate workflow/tas_expanded.json (svc_id throughout). Two architecturally distinct files on disk, no translation or in-memory synthesis.
Orchestrator wiring (src/methods/experimental.py). _build_mesh_specs became a small dispatcher that emits 8 or 13 MeshSpecs. Port layout: TAS at tas_base_port; in expanded mode the next 5 ports cover TAS_{2..6}; then the sorted atomic ids. bring_up_mesh is unchanged, already topology-agnostic.
Demo. tests/demo/granularity.py runs all three modes (collapsed, expanded mu off, expanded mu on) back-to-back with n_requests=5, prints a per-run worker table (svc, OS PID, host:port, parsed from the per-pid CSV filenames + the first JSONL flow record for TAS_1), and a side-by-side comparison. CLI smoke verified all three modes produce 100 % success at low load.
Apparatus-wide cleanup pass. 19-module sweep across src/experimental/prototype/target/ + src/experimental/procedure/deployment.py:
- Module + class docstrings stripped of stage references, em-dashes, and source/objective/deviation blocks (technobabble); kept the substance, dropped the framing.
- Single-return rule applied across catalogue.py / workflow loader / workflow engine / failure.py wherever multi-return blocks lingered (
lookup,load_catalogue,_parse_catalogue,_build_entry,_build_step,apply_inject_failure,_dispatch,_step_to_dict). - Renames:
_AdmissionGate→AdmissionGate;_AtomicRoutes→AtomicRoutes;_maybe_log_csv→_log_csv_row(twice, in third_party.py and internal_stage.py);_handle_with_status→_dispatch(shorter). - Fixed pyright
Nevererror in internal_stage.py by deleting the deadif not isinstance(_body, dict): _body = ...branch; the return type already guaranteeddict[str, Any], so the branch was unreachable and the assignment narrowed_bodytoNever. AdmissionGate.acquirerewritten to single-return; redundant(body or {}).get(...)repetition in two_log_csv_rowmethods collapsed to a single_body = body or {}then plain.get(...)per field.tas.pyhelper docstrings (_lifespan,_post_root,_write_flow_record) gained complete Args / Yields / Returns sections.target/config.pyaligned with the catalogue pattern: split the singleDFLT_TGT_CFG_PATHintoDFLT_TGT_CFG_DIR+DFLT_TGT_CFG_FILE, joined insideload_target_cfg. Test imports updated.
Test-side cleanup.
- Pyright complaints defused: frozen-dataclass writes routed through
setattr(...)insidepytest.raises(FrozenInstanceError);_iter.__anext__()replaced with theaiter()+anext()builtins (Starlette'sAsyncContentStreamis typed loosely enough that the dunder is invisible to pyright). - Unused imports dropped from
test_engine.py(ServiceRegistry,ServiceCache,ServiceDescription,ServiceClient) andtest_client.py(typing.Any). Missingsource=""arg added to oneServiceCatalogue(...)literal. - Test names shortened aggressively across 9 test modules:
test_atomic_app_round_trip→test_round_trip;test_inject_failure_5xx_short_circuits→test_inject_5xx;test_dispatch_status_zero_becomes_502→test_status_zero_to_502;test_first_of_kind_picker_no_match_raises→test_picker_no_match; and ~30 more. The*test_name()* ...docstring carries the contract; the function name is just a handle.
Test surface. 348 tests pass (tests/experimental/ + tests/methods/). No regressions.
Next. Stage 6: controller + adaptation strategy plug-ins + R1/R2/R3 verdict computation. The granularity switch sets us up to attach probes to TAS_{2..6} directly when running in expanded mode for stage-9 yoly comparisons.
Decision. Close stage 4 of the src/experimental/ rebuild. The calibration ping/echo apparatus, the gate / report layer, the per-probe + summary plotters, the notebook-driven workflow, and the multi-process load generator are all wired end-to-end. The apparatus produces a per-deployment envelope JSON and a 2x3 figure (single) / 2x4 figure (overlay) characterising clock + scheduler + kernel + handler + worker scaling floors, plus the operating envelope (c_max, r_max, w_max). Test surface holds at 416+ green across the repo; calibration namespace alone is at 94 tests.
Probes (six, all adaptive / self-terminating).
probe_timer— clock floor (median delta between consecutiveperf_counter_nsreads).probe_jitter— scheduler floor (asyncio.sleepovershoot vs target, wrapped inwindows_timer_resolution(1)).probe_loopback— kernel TCP floor; 32 KiB payload via a_recv_exact()accumulator (recv()returns up to N bytes; at kB-scale the response spans multiple syscalls and the timer must wait for all of them).probe_handler_scaling— asyncio multiplexing floor; adaptive rampstart / stop / step, halts when the median drifts beyondmax_drift_pct. Localhost only (mode-invariant; re-running on multiprocess produces duplicate numbers).probe_rate— per-worker saturation knee;start / stop / stepramp, halts ontarget_loss_pctormax_p95_latency_us. Always runs atworkers=1(mode-independent per-worker measurement).probe_workers_scaling— parallel-limit knee; rampsbring_up(workers=n), drivesn * rate_per_worker_factor * saturation_rateaggregate, halts onmin_eff_pct. Multiprocess only.
Multi-process load generator (multi_proc_driver.py).
A single httpx.AsyncClient saturates around 1-1.5k req/s on Windows; past that the client misattributes its own busy-state as worker saturation. make_multi_proc_driver(n_clients) returns a RateDriver that fans the load across N ProcessPoolExecutor children, each running drive_at_rate_raw with its own httpx pool, and merges the raw latency lists before computing aggregate percentiles (median-of-medians is statistically invalid; merging the underlying samples first is correct). Default n_clients=8 is deliberately overdimensioned: we want the host-vernier interaction to fail before the client does. n_clients=1 short-circuits to the existing single-process driver.
Gate / report layer (gate.py, characterization.py).
gate.verdict()returns a structured report, not a binary pass/fail. Sections:precision_band_us(quadrature sum of timer / jitter / loopback std-devs),verifiable_range(c_max / r_max_req_s / w_max),gates(handler / saturation / workers — each with the knee, not a runaway),floors(informational),summary(one mathtext-formatted headline per probe). Thepassedflag stays for downstream programmatic use but is no longer foregrounded; the user explicitly rejected the FAIL banner ("the verdict is dissertation territory; the tool's job is to measure and present").- Report panel is a borderless attribute-vs-column table. Single-envelope panel: one column whose header is the host name; pre-table
RUNgot dropped,<host>: <datetime>lives in the panel title (parsed frommake_run_id's<prefix>_<timestamp>_<nonce>shape). Overlay panel: one column per envelope with mode-aware bottom-left scaling slot (handler vs workers). - Static three-block legend (
Latency:/Floors:/Envelope:) explains the data-only rows in plain prose. - Mathtext for math symbols (
$\pm$,$\mu$s,$\leq$); the CLI helper_to_terminal()strips mathtext markers for stdout-friendly output.
Figure layout. Single-envelope summary stays 2x3 (timer / jitter / loopback / scaling-slot / rate / report). Cross-deployment overlay is 2x4 portrait (figsize=(14, 18)): timer / jitter, loopback / rate, handler-scaling / workers-scaling, report-spans-both-cols. Bottom Report cell is centred horizontally via an x_offset plumbed through the put-helpers.
Configuration knobs (calibration.json) — all probes parameterised from one JSON; runtime fallbacks in source.
{
"vernier": {"K": null, "c": null},
"dpl": {"host": "127.0.0.1", "base_port": 9042, "workers": 4, "ready_timeout_s": 20.0},
"hoststats": {
"timer": {"samples_n": 1000},
"jitter": {"samples_n": 100, "target_us": 1000},
"loopback": {"samples_n": 100, "payload_bytes": 32768},
"handler_scaling": {"start": 8, "stop": 1024, "step": 32, "samples_per_c": 50, "max_drift_pct": 5.0}
},
"rate": {"start": 8, "stop": 1024, "step": 32, "per_rate_s": 5.0, "target_loss_pct": 5.0, "max_p95_latency_us": 100000.0},
"workers_scaling": {"start": 1, "stop": 32, "step": 1, "per_step_s": 5.0, "rate_per_worker_factor": 0.7, "rate_per_worker": 200, "min_eff_pct": 90.0, "n_clients": 8},
"gate": {"noise_floor_pct": 5.0}
}Notebook (00-calibration.ipynb).
Twelve thin cells. Imports + DPLS: list[Dpl] typed annotations + Dpl-typed DISPLAY map. Cell-3 runs run_calibration over ["localhost", "multiprocess"]. Cell-5 prints the structured report via _print_calibration_report. Cell-7 saves the per-deployment summary figures. Cell-9 saves the cross-deployment overlay. Cell-11 saves the per-probe standalone figures (handler_scaling guarded for localhost, workers_scaling guarded for multiprocess). Cell-2 markdown explains the deliberate n_clients=8 overdimensioning.
Conventions captured this round.
- Calibration
c(handler concurrency, asyncio multiplexing axis) is not queueing-theoryc(parallel servers in a station, M/M/c/K). Memory entry feedback_calibration_c_vs_queueing_c.md records the collision; the project's existing "do not conflate" rule applies. Calibrationr= arrival rate per worker; calibrationw= parallel worker processes. Report-panel labels use the verbose words (Concurrency:,Rate:,Workers:) to dodge the bare-ccollision. make_run_idseparator unified to a single_between prefix and timestamp (was__). All disk paths and run-id parsers updated;_fmt_run_dateis now shape-tolerant (scans for theYYYYMMDDTHHMMSSZsegment regardless of position).- Handler scaling and workers scaling probes now follow the adaptive ramp pattern (
start / stop / step / max_drift_pctormin_eff_pct), mirroring the rate sweep. Self-termination keeps slow-box runs to ~1-3 minutes per deployment; explicit fixed-list configs are gone. - Gate / summary / report contract: data-only headlines (
gate.summary[*].headline), no per-row interpretive prose, no FAIL banner. Interpretation lives in the dissertation, not the tool. - Bar-chart panels (timer / jitter / loopback) use horizontal bars with
±s²error caps; loopback gets a translucent precision band[φ-s², φ+s²]. - Workers scaling panel: y-axis = "worker rate (req/s)", twin axis = "efficiency (%)", green-shaded verifiable region up to the stable knee.
- Greek-symbol convention pinned: median =
$\Phi$(uppercase), sample mean =$\overline{\chi}$(chi with overline, NOT\hat{\chi}and NOT\bar{\chi}). Applies across timer / jitter / loopback / handler-scaling labels. - Calibration panel axis labels finalised. Handler Scaling x =
$c$: concurrent user requests, y =Latency change $[\mu s]$(loopback floor subtracted from displayed values). Workers Scaling x =$w$: concurrent workers, y =worker rate (req/s). Rate Sweep x =target rate (req/s), y =$p_{95}$ latency $[\mu s]$. - Overlay table polish: legend / table column body narrowed by 45 % (
_OVERLAY_COL_START,_REPORT_LEGEND_BODY_Xfrom 0.20 -> 0.12); the divider aboveLatency:now renders as anax.plot()segment in axes coordinates (matched to table column span viadiv_left/div_right) instead of a glyph row.
Stop-gate verification.
pytest tests/-> 416+ passed (94 in calibration namespace; rest unaffected).python -m tests.demo.calibrationruns end-to-end ondpl=localhost; prints the structured report; writes one envelope JSON.data/img/calibration/<dpl>/{timer,jitter,loopback,handler_scaling,rate_sweep,summary}.{png,svg}produced;data/img/calibration/comparison/overlay.{png,svg}produced.- CLI mathtext strip verified:
_print_calibration_reportoutputs+/- 0.05 us,c <= 8,knee at c=4 (+1.2%)(no leftover$...$markers).
Open follow-ups (none blocking).
- Commit a baseline envelope under
data/results/calibration/<dpl>/for the case-study reference (user runs this manually as part of the stage-4 wrap-up). validate_experiment_against_envelope(exp_cfg, envelope)helper — three guard checks againstc_max / r_max / w_max. Adds 5 minutes when the experiment method (stage 5+) lands.- Multiprocess-specific tuning: if
n_clients=8still leaves the client as the bottleneck on slow boxes, the next step is multi-host load generation (wrk2,vegeta,oha); currently out of scope.
Next step. Stage 5 — wire the actual TAS service topology (MAS_{1..3}, AS_{1..3}, DS_{3}) under the experimental method, drive R1/R2 measurements against data/config/profile/{dflt,opti}.json, write per-request flow JSONLs and per-service CSVs. Calibration's c_max / r_max / w_max feeds the experiment's bounds checker.
Decision. Close stage 3 of the src/experimental/ rebuild (log/prototype-refactor-plan.md) and run an audit pass over the new code.
What landed (stage 3). Six modules under src/experimental/prototype/runtime/:
server.py:ServerAdapterABC +FastAPIAdapter/FlaskAdapter+Handlerprotocol +make_server_adapterfactory +FlaskProcess/ManagedProcesstype aliases.uvicorn_process.py/waitress_process.py/gunicorn_process.py:mp.spawnprocess spawners, identical surface (start/wait_ready/shutdown/is_alive). Each registers in a module-levelWeakSet+atexithook for crash-path zombie cleanup.GunicornProcessraises on Windows pointing atWaitressProcess; gunicorn import gated bytry / except ImportError(POSIX-only).os_timer.py:windows_timer_resolutionctxmgr (winmm wrapper, no-op on POSIX).async_loop.py:run_async_safe(Jupyter-safe sync entry);CoroFactory: TypeAlias = Callable[[], Coroutine[Any, Any, Any]].config.py: loader fordata/config/method/experimental.json::server.{uvicorn,waitress,gunicorn}runtime tuning blocks.
Test surface. tests/experimental/prototype/runtime/ mirrors the source 1:1; 50 tests, 90% coverage on the runtime package. Linux-only spawn paths exercised on Windows by mocking multiprocessing.get_context + httpx.get. Shared helpers added at tests/utils/exp/apps.py (FastAPI + Flask /healthz factories, picklable across mp.spawn) and a new tests/utils/exp/ports.py (free_port() + PORT_MOCK = 9042 sentinel).
Demo. tests/demo/runtime.py: python -m tests.demo.runtime brings up FastAPI (uvicorn) + Flask (waitress) side-by-side on free localhost ports, hits /healthz on each over real TCP, prints responses, tears down.
Deps pinned. flask==3.1.3, waitress==3.0.2, gunicorn==23.0.0; sys_platform != "win32".
Audit pass — what changed.
- Constants
_DFLT_*privatised across all three spawners; constructors takeready_timeout_s/terminate_grace_s/kill_grace_skwargs sourced fromexperimental.json::server.<spawner>.*. ServerAdapter.wait_ready(timeout_s=None)now propagates to the spawner's configured_ready_timeout_sinstead of hard-coding 10.0.- gunicorn import refactor:
if sys.platform != "win32":block replaced with standardtry / except ImportErroroptional-dep idiom;_GunicornDriver(renamed from_GunicornApp) lives at module scope unconditionally. - Pyright literal-narrowing fix: platform check delegated to
_check_linux_or_raise()so__init__body stays reachable. - Dead code removed: vestigial
from ...async_loop import CoroFactoryre-export inuvicorn_process.py;AttributeErrorfromos_timer.pyexception list (cannot fire aftersys.platformguard); literal-narrowedif sys.platform == "win32":branch inasync_loop._worker_run_coro(Python 3.8+WindowsProactorEventLoopPolicyalready returns the right loop class). - Test cleanup:
test_flask_picks_gunicorndeleted (duplicate oftest_make_flask_gunicorn); vacuousassert GunicornProcess is not Nonedeleted; nesteddefs intest_async_loop.pylifted to module scope; lazy imports moved to module top intest_uvicorn_process.py+test_waitress_process.py;port=0/port=1sentinel values replaced with 9042+ viaPORT_MOCKandfree_port();tests/experimental/prototype/runtime/conftest.pydeleted (only oneconftest.pypertests/experimental/); test method names shortened across the runtime suite. - Module + class + method docstrings rewritten across the runtime package + test mirror: shorter, plain-language, less code-symbol density.
- Atexit hook +
_LIVE_PROCESSESregistry inline-commented across all three spawners (one-liner# Crash-path safety net.and# Live spawners; atexit cleans these up on exit.).
Stop-gate verification.
pytest tests/experimental/-> 135 passed (50 new underruntime/).pytest tests/-> 308 passed (full surviving suite + new runtime tests).python -m tests.demo.{log_format,client,runtime}all run cleanly.grep -r "src\.experiment\|src\.calibration\|MockTransport" src/-> zero hits (transport mock allowed only intests/).- Coverage: 95% on
src/experimental/; 90% onsrc/experimental/prototype/runtime/specifically. Above the 80% gate.
Conventions captured elsewhere this turn.
- Memory: project_experimental_stage_3_closed_2026_05_08.md records the stage-3 closure + naming pins (mocking pattern for Linux-only spawn, app-factory pickling rule).
- New rule pinned: app factories that cross
mp.spawnboundaries MUST be top-level functions (closures + lambdas don't survive Windows pickle).
Next steps.
- Stage 4: calibration ping/echo (
prototype/calibration/{vernier,hoststats,rate,envelope,gate}.py);tests/demo/vernier.py; first end-to-end run of the apparatus through a 1-service mesh. - Stage 5: deployment options (
localhost/multiprocess/remote); calibration reruns under all three.
2026-05-07 — notes/case-study.md rebuilt as full ACS 6-section reconstruction; SVG-crop pattern locked
Decision. Replace the ad-hoc 129-line notes/case-study.md with a clean ACS 6-section reconstruction merged from __OLD__/notes/context.md (long-form draft) and __OLD__/notes/objective.md (concise version). Lock down an inline-SVG figure-embedding pattern that survives VS Code markdown preview's HTML sanitizer.
What was added (case-study.md).
- Six sections:
1. Summary,2. Technical Specifications,3. Architectural Reconstruction,4. Limits,5. Insights,6. Design Notes, plus7. References. Table CS1.1. *TAS* case specification.absorbs the prose front matter (Source documents, Methodology, Status, Scope) into table rows alongside the existing identity rows.- Numbered headings:
## 1.…## 7.for H2; lowercase Roman### i.…### x.for H3. - Short identifiers:
RQ-CS1.kcollapsed toRQ.k,ADR-CS1-XXcollapsed toADR.XX(the whole note is CS-1; the infix repeated context). - Acronym first-use expansions:
ACS,QA,MAPE-K(Monitor-Analyse-Plan-Execute-Knowledge),STA(stochastic-timed-automata),SOA(Service-Oriented Architecture),PCA(Principal Component Analysis),RSEM(relative standard error of the mean),ADR(Architectural Decision Record). - Stripped: MATI /
[11]references, in-repo path pointers (assets/docs/CS/N1/,.claude/skills/...,src/methods/<method>.py + 0N-<method>.ipynb), and DASA / methodology-token mentions. - Cross-source inconsistency table preserved verbatim from
__OLD__/notes/context.md(14 rows, then 13 after dropping the [11]-only metric-count row).
What was added (SVG figure crop). Each of the four reconstruction figures (CS1.1-CS1.4) is wrapped in:
<figure style="margin:0">
<svg version="1.1" viewBox="0 0 W H" width="100%" preserveAspectRatio="xMinYMin meet" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" role="img" aria-label="...">
<clipPath id="clip_id"><path d="M0 0hWvHH0z"/></clipPath>
<g clip-path="url(#clip_id)"><image xlink:href="../assets/img/cs1/<file>.svg" width="ORIG_W" height="ORIG_H"/></g>
</svg>
</figure>Per-figure crops in place after iterative tuning:
| Figure | viewBox (W × H) | Inner image (W × H) | Right crop |
|---|---|---|---|
| CS1.1 context | 2087 × 1456 | 2940 × 1456 | 29 % |
| CS1.2 workflow | 2697 × 2036 | 3210 × 2036 | 16 % |
| CS1.3 services | 2771 × 1698 | 3379 × 1698 | 18 % |
| CS1.4 adaptability | 1613 × 1380 | 2407 × 1380 | 33 % |
The five scenario figures (CS1.5a-e) keep plain  markdown.
Why this SVG pattern. VS Code's markdown preview is the publishing target. Three patterns were tried and rejected before this one stuck:
<div style="overflow:hidden;line-height:0">+<img style="width:150%;margin:0 -50% 0 0">. The CSS clip is silently sanitized; the image renders at full size with no crop.<figure style="overflow:hidden">+ same<img>styles. Same outcome; the sanitizer strips the wrapper's clipping CSS regardless of element type.- Bare inline
<svg viewBox=...>with<image href=...>inside. The markdown parser doesn't recognise<svg>in its HTML-block whitelist, so the markup renders as code text.
The working pattern needs three things together: a <figure> opener (recognised by markdown-it as block HTML), a fully-namespaced <svg version="1.1" xmlns="..." xmlns:xlink="...">, and a <clipPath> + <g clip-path="url(#...)"> wrapping an <image xlink:href="..."> reference. viewBox alone is not enough; the explicit <clipPath> is what survives the sanitizer.
Conventions captured elsewhere this turn.
.claude/skills/write/arch-case-study.mdextended with three new lessons (folded identity table, numbered headings + lowercase-Roman subsections, short identifiers when the case is unambiguous) and a new "Embedding SVG figures" subsection codifying the figure-clipPath pattern.CLAUDE.mdgained a short "Markdown Figure Embedding" pointer paragraph.
Next steps.
- Apply the same numbered-section + identity-table conventions to the CS-2 IoT-SDP case-study note in the sibling repo.
- If
notes/procedure.mdandnotes/prototype.md(currently 1-byte stubs) gain content, mirror the conventions here. - Verify on the GitHub-rendered preview (not just VS Code) that the inline-SVG-with-clipPath pattern still renders cropped; GitHub's sanitizer may have different rules.
Cross-cutting archive sweep that clears the slate before the next major refactor (the new software-architecture experiment). Plan-of-record at log/cleaning.md; memory entry at memory/project_cleaning_sweep_2026_05_06.md.
Phase 1 — archive sweep (filesystem-only; __OLD__/ is gitignored):
- Notebooks:
00-calibration.ipynb,05-experimental.ipynb→__OLD__/. Surviving root notebooks:01-04. - Source:
src/{calibration,experiment,scripts}/,src/methods/{calibration,experiment}.py,src/dimensional/{dasa_sweep,dasaprof}.py,src/io/tooling.py,src/view/characterization.py→__OLD__/. Fresh__init__.pywritten forsrc/{methods,dimensional,io}/(drop archived re-exports).src/view/__init__.pyedited in place to drop the threeplot_calib_*re-exports. - Tests:
tests/{calibration,experiment,scripts,demos}/,tests/methods/test_{calibration,experiment}.py,tests/dimensional/test_{dasa_sweep,dasaprof}.py,tests/io/test_tooling.py,tests/utils/helpers.py→__OLD__/. - Notes: 7 files (
calibration.md,commands.md,prototype.md,prototype-constraints.md,prototype-v2.md,soa-refactor.md,workflow.md) →__OLD__/notes/.notes/devlog.md→log/devlog.md(true move; this file). - Data:
data/config/method/{experiment,calibration}.json,data/results/{experiment,calibration}/,data/img/experiment/→__OLD__/. pyproject.toml: dropped the orphanlive_meshpytest marker (every consumer archived).- Stop-gate:
pytest tests/ -q= 180 passed in 209 s on the surviving subset.
Phase 2 — reorganise surviving surface:
README.mdrewritten short (~58 lines, was 262) by foldingnotes/SUMMARY.md+notes/quickstart.md(both deleted, content carried forward).notes/case-study.mdnew (~129 lines) by foldingnotes/objective.md+notes/context.md(both deleted, content carried forward; cross-source inconsistency table left at__OLD__/notes/context.md).notes/procedure.md+notes/prototype.mdscaffolded as design-doc skeletons for the new build.CLAUDE.mdrewritten leaner (213 lines, was huge): kept style / coding / notebook / view / testing / commit conventions + PyDASA notes + Migration-from-__OLD__/section; dropped calibration / experiment /dpl/ scripts / async-ctxmgr blocks.memory/MEMORY.mdindex updated with the cleaning entry on top + a one-line warning that older entries reference paths now under__OLD__/.src/experimental/(__init__.py+ emptyprocedure/+prototype/) is the new build's scaffold; left in place untouched.
Decisions resolved during the sweep (full table in log/cleaning.md):
data/config/method/calibration.jsonarchived alongsideexperiment.json(symmetric withsrc/methods/calibration.py).assets/docs/architecture_experimentation.md+assets/docs/operational_analysis.mdleft in place (still useful reference; recover from history if archive is later preferred).- Empty scaffolds
log/prototype.md+log/procedure.mddeleted (the newnotes/prototype.md+notes/procedure.mdare blank-slate writes). - Commit shape: user committed manually after Phase 1; Phase 2 staged for the next commit.
Why a clean slate: the prior FastAPI-mesh experiment build had landed two large refactors in the past month (prototype-v2 reshuffle + calibration C0-C11). Methodological audit identified the dpl="localhost" MockTransport as monolithic-pretending-to-be-SOA; the SOA refactor (Phase A2-A9 + Phase B) was queued to fix that. Rather than continue stacking refactors on top, retire the build into __OLD__/ and rebuild the experiment from scratch under src/experimental/ with the methodology constraints learnt from the previous build. The case study (notes/case-study.md), the surviving methods (analytic, stochastic, dimensional), and the conventions stay; the apparatus is the part being reset.
Closed the 11-stage calibration refactor opened 2026-05-03 evening. src/methods/calibration.py shrunk 2640 → 844 lines (68%); the architectural-conformance findings (PyDASA pipeline duplication, dim-card-in-orchestrator-layer, multi-combo sweep in methods/) all resolved by relocation alone. The calibration package now follows the methodology layering: precondition-gate building blocks under src/calibration/, model artefact under src/dimensional/, thin orchestrator under src/methods/. Closure record at notes/calibration.md; per-stage outcome at memory/project_calibration_refactor_closed_2026_05_06.md.
C9b-Phase 1 src/dimensional/dasa_sweep.py — new home for run_calib_sweep + helpers (_drive_one_combo, _resolve_mu_anchor, _resolve_sweep_grid, _build_sweep_output_path). The sweep is the multi-combo (c, K, mu_factor) dimensional sensitivity probe over a calibrated mu anchor — it's a dimensional-card sweep, not an orchestrator concern, so it lives next to dasaprof.py. Cross-imports _drive_lambda_step + _post_one from src/calibration/rate.py (the lambda-stepping engine stays in the rate module; dasa_sweep is the multi-combo composition that feeds rate stepping per (c, K, mu_factor) cell). 11 tests covering the mu-anchor table (4 paths: explicit / loopback / unknown / zero-degenerate), sweep-grid resolution (explicit / fallback to JSON), output-path shape (host normalisation, _sweep filename suffix, per-dpl subdir), and run_calib_sweep empty-input early-returns.
C9b-Phase 2 src/methods/calibration.py rewritten from scratch — kept run() + CLI main() + _run_async_probes + zombie cleanup + path helpers; everything else removed. Re-export aliases at module top preserve the public + test-private surface (run, derive_calib_coefs, run_rate_sweep, run_calib_sweep, run_handler_stability_sweep + 6 underscore symbols _aggregate_rate_trials / _batch_size_for / _find_highest_sustainable_rate / _parse_rates / _resolve_mu_anchor / _CALIB_DIM_TAG) so existing notebook + CLI + experiment-method gate continue to work without code changes. 8 monkeypatch sites in tests/methods/test_calibration.py retargeted to string-path form (monkeypatch.setattr("src.dimensional.dasa_sweep._drive_one_combo", ...)) since attribute-form patches now hit the orchestrator's namespace, not the new module's binding. 11/11 retargeted tests pass in 1:49.
C10 notebook 00-calibration.ipynb — cell-1 imports updated to use new package locations directly: from src.dimensional import derive_calib_coefs, run_calib_sweep; run stays imported from src.methods.calibration (orchestrator). 25 cells; JSON valid; all 5 callables resolve to expected modules (verified: derive_calib_coefs source: src.dimensional.dasaprof, run_calib_sweep source: src.dimensional.dasa_sweep, run source: src.methods.calibration).
C11 final docs pass — notes/calibration.md marked "REFACTOR CLOSED" with all 11 stage rows showing ✅ DONE; CLAUDE.md "Calibration" callout rewritten from "refactor in progress" to a closed-state callout with the 13-row module-map table; MEMORY.md top entry replaced with the closure record project_calibration_refactor_closed_2026_05_06.md.
Decision-log items:
- Picked option (a) for C9b — built
src/dimensional/dasa_sweep.pyas new home rather than accepting a ~1400-line orchestrator. The 844-line final shape onsrc/methods/calibration.pyundershoots the ~250-line target stated in the original plan; the gap is path helpers +_run_async_probes+ zombie cleanup that legitimately belong in the orchestrator (they wire CLI args + stop conditions to theSweepController, which is the orchestrator's job). - Re-export aliases vs full test rewrite: chose re-export aliases at module top so the existing 6 underscore-private test-symbol references survive without touching test bodies; the price is one extra layer of indirection from caller to implementation. Pinned in
notes/calibration.mdas "test-private-symbol re-export pattern". - Monkeypatch retargeting via string-path: tests using
monkeypatch.setattr(cal, "_X", ...)no longer affect the new package's bindings (they patch the orchestrator's namespace, not the new module's binding). Fix: use the string-path formmonkeypatch.setattr("src.dimensional.dasa_sweep._X", ...)to patch where the function is actually looked up. Future cleanup deferred: replace these with public injectable hooks onSweepController.
Validation: 102 tests passing in fast scope (29s — tests/calibration/ + tests/dimensional/ + tests/io/test_tooling.py); 11 monkeypatch-retargeted tests passing in 1:49 (tests/methods/test_calibration.py); legacy 386-pass tests/{experiment,methods,io,scripts}/ baseline unaffected. Smoke check: from src.dimensional import derive_calib_coefs, run_calib_sweep resolves; from src.methods.calibration import run resolves through re-export.
Followups deferred:
tests/methods/test_calibration.pyreaches into private symbols of the new package via string-path monkeypatch. Future cleanup: replace with public injectable hooks onSweepController.run_async_safeis annotated-> Dict[str, Any]; three call sites usecast(...)to recover concrete types. Long-term fix: makerun_async_safegeneric via TypeVar insrc/experiment/runtime/async_loop.pyso the casts collapse.- IMG path
data/img/experiment/calibration/not migrated (only RESULTS path moved per Q-B). Move todata/img/calibration/for symmetry as a separate decision. controller.yoly_dataframe()API not yet implemented; notebook's yoly chart cell still callsderive_calib_coefsdirectly.
Successor task: SOA Phase A Stages A2-A9 per notes/soa-refactor.md. The three building blocks calibration delivered (UvicornProcess, make_gauge_factory, per-dpl envelope writer) are exactly what Phase A's experiment mesh needs. Confirm with user before starting A2 — it's a substantial new piece of work that warrants explicit signoff.
Closed the long-standing TODO from the 2026-04-18 conftest.py decision. Created pyproject.toml at repo root carrying [tool.pytest.ini_options] only (pythonpath = ["."] for from src.* import ... resolution + markers = ["live_mesh: ..."] for the custom marker registration). Deleted root conftest.py. tests/conftest.py is untouched and still owns the shared PyDASA fixtures (method_cfg, dflt_profile, opti_profile, schema, tas1_vars, engine_bare, engine_ready, sensitivity_results).
Decision-log items:
- Kept the file minimal: no
[build-system]or[project]block. This repo is a case-study deliverable (notebooks + figures + metrics), not a distributable wheel; adding a build system would invite scope creep with no consumer. - The
live_meshmarker registration moved verbatim from the deletedconftest.py::pytest_configurehook. The marker description was reaffirmed during the swap: it covers any test that spins up a real FastAPI mesh (UvicornThreadin-process,UvicornProcessout-of-process, or multi-trial sweeps over either). It names the mesh-spin-up cost axis, not the process-distribution axis — multiprocess is a strict subset, not a synonym.
Verification: pytest --collect-only -q collected 640 tests post-swap; pytest tests/utils/ (3 passed in 0.10s) confirms the path resolution; pytest tests/dimensional/ (92 passed in 91s) confirms the session/module-scoped fixtures from tests/conftest.py still resolve. No unknown-marker warnings.
Pickup: when ruff / mypy / hatchling config lands later, extend the existing pyproject.toml rather than spawning a sibling ruff.toml / setup.cfg. If a new pytest marker is introduced (e.g. a multiprocess-only live_mesh_mp subset), register it in the markers list.
C8 (src/dimensional/dasaprof.py): relocated derive_calib_coefs + its helper stack (_build_calib_observables, _calib_var_sym, _build_calib_vars, _run_calib_pipeline) from src/methods/calibration.py to a new src/dimensional/dasaprof.py, re-exported from src/dimensional/__init__.py. The pipeline already called src.dimensional.build_engine + build_schema; the C8 deliverable was the layering fix (move DOWN from methods/ orchestrator to dimensional/ model-artefact). Honest call: the canonical src/dimensional/coefficients.py::derive_coefs is shaped for TAS-architecture Pi-indexed specs and CANNOT directly serve calibration's standalone-artifact variable set, so the two paths legitimately stay siblings: derive_coefs (TAS, Pi-indexed) + derive_calib_coefs (calibration, base-variable expressions). Both correctly live under src/dimensional/. Byte-identical regression: 3 tests in tests/dimensional/test_dasaprof.py (single-K zero-payload, multi-K with 128 kB payload, custom subscript tag) all green; the new path produces identical output to the legacy src/methods/calibration.py::derive_calib_coefs for the same envelope inputs.
C9a (path migration): 51 on-disk JSONs migrated from data/results/experiment/calibration/ to data/results/calibration/localhost/ via a single mv. Updated src/io/tooling.py::_CALIB_DIR to data/results/calibration/<dpl>/ (with new _CALIB_ROOT constant + new dpl parameter on find_latest_calibration / load_latest_calibration, defaulting to "localhost" for back-compat). Updated src/methods/calibration.py::_CALIB_DIR to match. Fixture tests/io/test_tooling.py::_isolated_calib_dir updated to monkeypatch _CALIB_ROOT (and the legacy _CALIB_DIR alias) so the per-dpl subdir resolution is exercised correctly. Stop-gate pytest tests/calibration/ tests/dimensional/ tests/io/ tests/scripts/ = 136 passed in 40s; broader run including tests/methods/test_calibration.py = 231 passed in 35:46 (3 of which were the byte-identical C8 regression tests). Smoke check: find_latest_calibration(socket.gethostname()) resolves correctly to the migrated tree.
Decision-log items:
- C8: scope honesty. The architectural-conformance report flagged "PyDASA pipeline duplication" between
_run_calib_pipelineandderive_coefs. Closer reading:_run_calib_pipelinealready callsbuild_engine+build_schema; the only duplication ispydasa.Coefficient(...)construction, which is INTENTIONAL because the calibration variable set + FDU count differ from the TAS topology so Pi-group ordering shifts. Forcing calibration throughderive_coefswould require either widening that API (out of scope) or maintaining a parallel calibration spec block indimensional.json(schema duplication). The two paths legitimately stay siblings; the layering bug is fixed by relocation alone. Pinned innotes/calibration.md. - C9a: backward-compat strategy.
find_latest_calibrationandload_latest_calibrationgained adplparameter rather than splitting intofind_latest_calibration_localhost/_multiprocess. Default"localhost"keeps every existing call site working unchanged. Theexperiment.py::_resolve_baselinegate will pick updpl="multiprocess"once the experiment runs in that mode (post SOA Phase A2).
C9b paused for scoping decision (see notes/calibration.md "C9b scoping" section). Three options: (a) build src/dimensional/dasa_sweep.py to home run_calib_sweep, then methods/calibration.py truly shrinks to ~300 lines (recommended; closes the original 250-line target); (b) accept a ~1400-line orchestrator (faster but undersells the refactor); (c) aliases-only (cosmetic). Pickup: pick an option, then execute. Both old src/methods/calibration.py and the new src/calibration/ package work end-to-end today, so there's no urgency.
Aggregate state at close: 9 new src modules under src/calibration/ + src/dimensional/dasaprof.py; 241 new tests across the calibration + dimensional packages, all green; old src/methods/calibration.py (2640 lines) functional but contains duplicates that will be deleted in C9b once the home for run_calib_sweep is decided.
Long autonomous session executing the calibration refactor plan written in notes/calibration.md. Eight new modules under src/calibration/ plus the UvicornProcess runtime extension; 113 new tests across 8 source files, all green; full per-stage audit pass against .claude/skills/develop/coding-conventions.md + .claude/skills/code/code-documentation.md.
Stages closed:
-
C1 src/experiment/runtime/uvicorn_process.py —
UvicornProcessmirrorsUvicornThreadAPI but spawnsmultiprocessing.Processwith picklableapp_factory. Windowsspawnsemantics validated end-to-end against the gauge: factory pickles, child process re-imports the module, FastAPI app builds in the worker,/healthzanswers 200. Doubles as SOA Phase A Stage A1. 8 tests; type-fixOptional[mp_process.BaseProcess](pyright caughtSpawnProcess != mp.Process); composition over inheritance (sibling-symmetry withUvicornThread'sthreading.Threadsubclassing decided against becausemultiprocessingis API-shaped for composition + we need explicitspawn-pinning for per-PID seed reproducibility). -
C2 src/experiment/instances/gauge.py —
make_gauge_factory(spec, payload_size_bytes)returns afunctools.partial(build_gauge, spec, payload_size_bytes). Bothbuild_gauge(module-scope) andSvcSpec(frozen dataclass over primitives) are picklable across the Windows spawn boundary, so the factory survivesmultiprocessing.Process(target=worker, args=(factory, ...)). The naming asymmetry vsbuild_gaugeis intentional and signals the return-type distinction (FastAPI app vsCallable[[], FastAPI]). 10 tests including a live spawn-via-factory smoke. -
C3 src/calibration/conditionals.py —
StopConditionsfrozen dataclass with locked-decision defaults (rejection=5.0,phi=1.0,sigma=2.0,loopback_delta=5.0); pure predicatesshould_stop,should_stop_detailed(returns provenance dict for envelope),loopback_two_trial_ok. 32 boundary tests covering rejection-strict-greater + phi-greater-or-equal + sigma-strict-greater semantics + multi-trip precedence + symmetric loopback delta + non-positive medians raise. -
C4 src/calibration/envelope.py — per-
dplJSON I/O.output_path / write_envelope / find_latest / load_latest. Path shapedata/results/calibration/<dpl>/<host>_<YYYYMMDD_HHMMSS>.json(Q-B locked: drops the/experiment/segment). Atomic write (temp + rename); host-prefix glob with space-to-hyphen normalisation; mtime ordering. STILL PENDING for C9: one-shotmvof the 47 existing JSONs fromdata/results/experiment/calibration/to the new path +src/io/tooling.py::_CALIB_DIRswitch. 18 tests. -
C5 src/calibration/hoststats.py — host-floor probes
snapshot_host_profile / measure_timer / measure_jitter / measure_loopback / measure_handler_scalingplus the canonical stats helpersstats_from_us_array / stats_from_us_status_pairs(renamed from leading-underscore now that they cross the package boundary). Lands ADDITIVELY: the duplicate code insrc/methods/calibration.pystays untouched until C9. 11 tests (9 inline + 2 live_mesh). -
C6 src/calibration/rate.py + stability.py — rate-saturation discovery (
run_rate_sweep / find_highest_sustainable_rate / batch_size_for) and apparatus self-consistency (run_handler_stability_sweep / aggregate_stability_cell / select_c_per_n_con_usr). Both use the newmake_gauge_factoryfrom C2 andrun_async_safefromsrc.experiment.runtimefor the sync→async bridge (replacing the old_run_sweep_in_dedicated_loopshim). Type-cast at therun_async_safeboundary in both modules (pyright:run_async_safe -> Dict[str, Any]widens; cast restores the concreteDict[float, ...]shape so.get(_rate, [])and.items()type-check; long-term fix is to makerun_async_safegeneric via TypeVar, deferred to C9). 21 tests covering the pure helpers; full sweeps deferred behind@pytest.mark.live_mesh. -
C7 src/calibration/controller.py — composition layer.
HostSweepGrid+DasaSweepGridfrozen dataclasses withfrom_configclassmethods readingcalibration.jsonpartials (defaults match the JSON one-for-one).SweepControllerholdshost_grid+dasa_grid+stop+dpl;_spawn_gaugebranches ondplbetweenUvicornThread(localhost) andUvicornProcess(multiprocess);run_host_sweepcomposes timer + jitter + loopback + handler_scaling + optional rate_sweep + optional stability_sweep into one envelope;run_dasa_sweepaccepts an injectedderivercallable so the controller stays decoupled fromsrc/dimensional/(Stage C8 will passderiver=derive_calib_coefs). 13 tests including a live end-to-end host-sweep ondpl="localhost".
Audit pass against both skills for each new module: zero em-dashes, non-circular docstrings preserving Args/Returns/Raises, short topic+outcome test names (mapping ~28 → ~17 chars average), one-test-class-per-source-module, top-level imports only, callable-class for stateful mocks (none needed in this batch), raise SomeError(_msg) with extracted message. Three rounds of audit per the user's /skills request — every C1/C2/C3/C5/C6/C7 file pair revisited; the _free_port helper docstring linter-stomp surfaced and was restored. Lessons: the sibling pair (UvicornThread / UvicornProcess) and (build_gauge / make_gauge_factory) keep their style and structure in lockstep; make_* vs build_* is a load-bearing distinction (return-type marker), not a naming inconsistency.
Type-fix carry-overs: Optional[mp_process.BaseProcess] for UvicornProcess._proc (one-line); cast(Dict[K, V], run_async_safe(...)) at three call sites (rate.py + stability.py + controller.py). Long-term run_async_safe should be generic over the coroutine return type; deferred to C9 as a single edit in src/experiment/runtime/async_loop.py so all three cast call sites collapse.
Old code in src/methods/calibration.py is UNTOUCHED. Tests for the existing module + the experiment-method gate in src/methods/experiment.py::_resolve_baseline continue to read from the old data/results/experiment/calibration/ path. The new package is purely additive; the swap-over happens in C9.
Paused before C8. C8 is the most consequential remaining stage because it requires a byte-identical regression test: derive_calib_coefs (currently in src/methods/calibration.py) must be moved to src/dimensional/dasaprof.py AND rewritten to call src/dimensional/engine.py::build_engine + src/dimensional/coefficients.py::derive_coefs instead of duplicating the PyDASA Schema → AnalysisEngine → Coefficient → MonteCarloSimulation(mode=DATA) pipeline. Risk: silent dim-card value drift if either pipeline interprets the input observables differently. Mitigation plan: feed a fixed-seed envelope into BOTH paths and assert the dimensional_card block is byte-identical before declaring C8 done.
Pickup at next session: C8. Tracker remains notes/calibration.md (per-stage status table updated for every completed stage). C9-C11 outline:
- C9: shrink
src/methods/calibration.pyto ~250-line orchestrator, switch consumers to import fromsrc/calibration/+src/dimensional/dasaprof.py, delete the duplicates extracted in C5/C6/C8, migrate the 47 on-disk JSONs (the C4-pendingmv), updatesrc/io/tooling.py::_CALIB_DIR. - C10:
00-calibration.ipynbmigrated to new imports; yoly chart cell callscontroller.yoly_dataframe()instead of running probes inline. - C11: CLAUDE.md "Module map" reflects new structure;
notes/calibration.mdmarked "refactor closed"; MEMORY.md updated.
Following the code-review + architectural-conformance report (notes/reports/code_review_calibration_2026-05-03.md), the user proposed a refactor of src/methods/calibration.py aligned with .claude/skills/design/experimental-design.md §1 ("Calibration is a precondition gate, NOT a hypothesis tolerance").
Locked decisions (all in notes/calibration.md "Refactor — locked decisions" table):
- I-1: stop on
reject_rate > 5%(any cell rejecting > 5% has crossed out of the M/M/c/K validity envelope; further data measures the host's saturation-handling code path, not the model's predicted regime). - I-2: two separate calibration runs, two separate envelopes —
data/results/calibration/{localhost,multiprocess}/<host>_<ts>.json. Different transport stacks → different μ values. - I-3: strict layer placement.
src/calibration/package +src/dimensional/dasaprof.py+src/methods/calibration.py(kept as thin orchestrator). - Q-A: calibration is self-contained —
calibration.jsononly. NO consumption ofdflt.json/opti.json(that α-clamp belongs to the experiment method, not calibration). - Q-B: result path drops the
/experiment/segment →data/results/calibration/<dpl>/. - Q-C:
payload_size_bytesstays at 128000 (128 kB). - Q-D: NO clamp on
os.cpu_count()— digital workers are software constructs; sweepingc=32on a 16-core host is the intentional contention regime measurement. - Q-E:
samples_per_level: 1024confirmed. - Q-F:
--dpl multiprocessMUST execute end-to-end. RealUvicornProcess-backed gauge in a separate OS process. Pulls SOA Phase A Stage A1 into calibration's scope as Stage C1.
Sequence locked: calibration-first. I argued for calibration before SOA Phase A Stages A2-A9 because (a) the SOA experiment mesh's R1/R2/R3 verdicts require a multiprocess calibration envelope to subtract loopback overhead from — running SOA first against today's localhost calibration would systematically under-report multiprocess overhead by 50-200 μs of TCP loopback; (b) UvicornProcess is the same work in both plans (calibration C1 = SOA A1) and the calibration vernier is the simplest possible service for the Windows spawn spike; (c) skipping the calibration refactor leaves SOA A9's "dissertation-grade numbers" stop-gate unsatisfiable until calibration ships afterwards anyway. User accepted; sequence is C0-C11 → A2-A9.
11-stage calibration refactor plan + acceptance criteria + target package layout are all in notes/calibration.md — that file is the canonical task tracker. Every stage commit references it; every closed stage updates the progress table. After C11 closes, the soa-refactor.md A2-A9 queue resumes.
A0 of SOA Phase A is now retroactively DONE — today's earlier deployment-axis rename + folder restructure (this morning's "evening" devlog entry) was Stage A0's identifier sweep, completed before the C0 work landed. The loopback_aliased → multiprocess and local → localhost renames + the data/results/experiment/{localhost,multiprocess,remote}/ folder restructure all fall under A0 and are stored in this devlog's earlier 2026-05-03 (evening) entry.
Pickup at next session: Stage C1 — UvicornProcess spike against the calibration vernier. If Windows spawn semantics break the FastAPI app-factory pattern, the refactor halts at C1 and we revisit; otherwise, C2-C11 follows.
2026-05-03 (evening 2) — Code-review + architectural-conformance report on src/methods/calibration.py
Combined /code-report (seven-section diagnosis) with architectural-conformance.md (design-intent vs as-built) lenses on the calibration module. Report at notes/reports/code_review_calibration_2026-05-03.md. Diagnosis only — no code changed.
Headline finding: calibration was promoted from src/scripts/ to src/methods/ (2026-04-23) without resolving whether it is a method (one hypothesis, one run) or a precondition gate (host-floor probes). Over ~10 days it accreted a Route-B dimensional-card pipeline + a multi-combo sweep, becoming a 2636-line god module exposing 5+ public callables against a documented promise of 1-2 in CLAUDE.md "Module map". The PyDASA pipeline at _run_calib_pipeline (src/methods/calibration.py:1977) duplicates src/dimensional/engine.py::build_engine + src/dimensional/coefficients.py::derive_coefs — two derivation paths for the same coefficient family is the single material risk.
Eight recommendations (R1-R8) ranked:
- R1 (XS): document the as-built scope in CLAUDE.md "Module map" +
notes/calibration.mdso the drift becomes controlled drift. - R2 (M): refactor
_run_calib_pipelineto callsrc/dimensional/'s pipeline instead of duplicating it. Single source of truth for theta/sigma/eta/phi. - R3 (S): move
derive_calib_coefstosrc/dimensional/calibration_card.py. - R4 (S): move
_DEFAULT_*constants out of module scope intorun()'s body. - R5 (XS): demote
measure_*and_build_calib_*from documented public API. - R6 (S): decide whether
run_handler_stability_sweepis gate or diagnostic; fold or split. - R7 (M): regression test asserting calibration card and dimensional-method coefficients agree within precondition gate.
- R8: defer structural changes until SOA refactor Phase A Stage A1 (vernier transport swap) lands.
Verdict: as-built state is acceptable for dissertation scope IF documented (R1). Recommend deferring all structural work until SOA Phase A closes, then bundle R2/R3/R4 as one calibration-cleanup pass. The headline risk (PyDASA pipeline duplication) can be neutralised by R2 alone — no file moves required.
Stage A0 of the two-phase SOA refactor (notes/soa-refactor.md) executed. Two coordinated changes landed in one sweep:
-
Deployment-axis rename, two passes. First pass swapped
loopback_aliased → multiprocess(the original Stage A0 plan); second pass renamedlocal → localhostso the deployment-axis literal matches its on-diskdata/results/experiment/<dpl>/folder name and reads as the universally-understood term for one-host loopback. Singularmultiprocess(notmultiprocesses) for register-consistency withlocalhost/remote. Sites:_VALID_DEPLOYMENTStuples insrc/methods/experiment.py+src/scripts/launch_services.py;data/config/method/experiment.json::deployment;TasArchitecture._gate_deployment+bind_addr;SvcRegistry._pick_host; every test class / method name + assertion / regex covering deployment values;tests/scripts/test_launch_services.py::test_localhost_all_short_duration. Englishlocal_services(),local_end_ts(CSV column, wire-schema off-limits), and "non-local routing" inservices/base.pyare NOT renames — those are domain English, not the deployment-axis literal. -
Folder restructure. Deleted
data/{results,img}/experiment/{aggregate,baseline,local,localhost,loopback_aliased,multiprocesses,remote,s1,s2}/(all stale, all untracked — pre-deployment-axis orphans + empty post-axis duplicates). Createddata/{results,img}/experiment/{localhost,multiprocess,remote}/.gitkeep. Calibration preserved:data/results/experiment/calibration/(47 host-keyed JSONs, ~3 min each) anddata/img/experiment/calibration/(10 PNG/SVG figures) untouched, because calibration measures the host's noise floor (loopback latency, jitter, handler scaling) — same number forlocalhost/multiprocess/remoteruns on the same host, so triplicating it under each<dpl>/would force a redundant 3-times re-calibration AND break the per-host gate insrc/io/tooling.py::find_latest_calibrationwhich globs one path keyed onsocket.gethostname(). -
Stop-gate:
pytest tests/experiment/ tests/methods/ tests/io/ tests/scripts/-> 386 passed, 2 failed (identical to baseline). The 2 failures areTestRampValidation::test_both_rates_and_rho_grid_raisesand::test_neither_raises— pre-existing regex mismatches from this morning's lambda_z anchor work, not regressed by the rename. Zero rename-induced failures.
Calibration-canary commit pinned for Stage A1. When runtime/uvicorn_process.py lands (Windows spawn spike), the same commit will switch src/methods/calibration.py::_register_vernier from UvicornThread to UvicornProcess(workers=1). Two reasons: (1) the vernier is the simplest c_srv=1, workers=1 case, so any spawn-related breakage surfaces against one service before touching the 13-service mesh; (2) once the experiment runs on UvicornProcess, calibration on UvicornThread would measure the noise floor through a different transport stack than the experiment uses, biasing the reported = measured - loopback_median ± jitter_p99 correction. Vernier stays at c_srv=1, workers=1 — workers=4 would fold worker-pool overhead into the floor.
Pickup at next session: Stage A1 = runtime/uvicorn_process.py + calibration vernier swap, in one commit.
Three discrete pieces of work landed today on top of the closed prototype-v2 reshuffle:
-
Post-v2 cleanup pass. Public-alias enforcement for
src.analytic(methods/experiment.py + methods/stochastic.py swapped tofrom src.analytic import ...);src/experiment/instances/gauge.py::build_gaugeshipped to give the vernier service the sameinstances/builder pattern asbuild_third_party(atomic) andbuild_tas(composite);methods/calibration.py::_build_ping_appand_build_vernier_app_for_comborefactored to usebuild_gauge. Three demo files fixed for the post-2026-05-01 service-layer protocol:demo_client.pymigrated toTasUser;demo_services.py@logger(_ctx)factory replaced with the_DemoHandlercallable-class pattern;demo_third_party.py_recorded_forwardreturn-type annotation fixed toExtFwdFn. Stop-gate: 387 passed. -
Experiment method anchored at lambda_z. Methodological fix — the experiment method was using a saturation-discovery ramp (
[50, 100, 200, 300, 500]) that conflated calibration's job (find host ceiling) with the experiment method's job (validate at the design point). Methods 1-3 evaluate the network atlambda_z = 345; for method 4 to be apples-to-apples in07-comparison.ipynb, it must measure at the same operating point. Fix: extendedexecutor._resolve_ratesandtooling._validate_ramp_blockto accept a third drive spec (anchor: "lambda_z") alongsideratesandrho_grid. Theanchorform readscfg.artifacts[entry_artifact].lambda_zand emitsrates = [lambda_z].experiment.jsonramp block now defaults toanchor: "lambda_z";05-experimental.ipynbdropped its_NB_METHOD_CFGoverride entirely and now callsrun_experiment(adp=a, wrt=True)directly. Stop-gate: 31 targeted tests passed. -
Calibration tuning + the bandwidth realisation.
data/config/method/calibration.jsonbumped:sweep_grid.c[0]: 8 → 16,sweep_grid.K[0]: 16 → 128, plus rate-sweep acceleration (trials_per_rate: 11 → 7,max_probe_window_s: 2.0 → 1.5,inter_trial_delay_s: 3.0 → 1.5,rates: [10, 50, 200, 300, 400, 500, 510]). Result: calibrated rate stayed at 200 req/s, confirming gating wasn't the bottleneck. The remaining ceiling is bandwidth (128 KB payload on Windows loopback) × Python single-process (asyncio single-event-loop GIL serialisation). Group C (drop payload to 32 KB) and multi-worker uvicorn deferred pending the SOA refactor.
The methodological discovery driving the next stage: dpl="local" (in-process MockTransport mesh) is monolithic — 13 FastAPI app objects in one Python process sharing one event loop is not SOA. The DASA case-study claim ("dimensionally normalised coefficients characterise the architecture, not the implementation") only holds if the prototype is actually a distributed service mesh. Two-phase plan written into notes/soa-refactor.md:
- Phase A — Path 2: multi-process on localhost (
dpl="multiprocesses"). ReplaceUvicornThreadwithmultiprocessing.Process; per-PIDSvcCtx+<service>__pid<PID>.csvlog files;build_svc_df_from_logsmerges per-worker CSVs;TasArchitectureconnect-only mode (real httpx, healthz-poll, no in-process app mount); launcher subprocess autoload. ~2-3 days. - Phase B — Path 3: LAN-distributed (
dpl="remote"). Configuration on top of Phase A — same code, differentexperiment.json::hosts. Stages B1-B6 add--bind-hostflag,wall_clock_offset_nsCSV header for cross-host time alignment, HTTP/_logs/<service>tarball endpoint for log collection. ~1 week (mostly setup + ops).
Critical invariant carried across both phases: per-service code is identical in local / multiprocesses / remote. Only SvcRegistry's host resolution + launcher orchestration + log-collection strategy differ. Phase A's whole point is to build the right abstractions so Phase B becomes a JSON edit + an SSH session, not another refactor.
Naming pinned the same day: dpl ∈ {local, multiprocesses, remote} reads monotone in distribution count. Renamed loopback_aliased → multiprocesses so the trio reads "single-process / multi-process-one-host / multi-process-many-hosts" — the meaningful axis, not the network-binding mechanism. Code-side rename sweep is part of Stage A0.
Phase A's four open design questions (G2) settled before any code lands:
- Per-worker seeding: fold PID into
derive_seed(root_seed, f"{service}_pid{pid}"). Each worker has independent reproducible streams per(root_seed, pid)pair; run-envelopenotesrecords all PIDs so post-hoc analysis can re-derive any stream. - Workers per service default: uniform 4.
--workers NCLI flag overrides. 13 × 4 = 52 worker processes per host (~7.8 GB RAM at 150 MB each on a 16-core box, ~3 workers per core). - Launcher activation: autoload by default +
launcher_started=Trueopt-out. Notebook gets autoload (subprocess.Popenfrom insidemethods/experiment.py::runwhendpl != "local"); CI / scripted bench / dissertation runs pre-launch with their own supervisor and passlauncher_started=True. - Keep
dpl="local": yes, marked explicitly as the dev/test mode. The methodological problem was usinglocalAS the case-study runner; the solution is to stop doing that, NOT to deletelocal. Branch in__aenter__between_init_routed_client + _mount_apps(local) and_init_real_http_client + _healthz_poll(multiprocesses).
Track 2 (test-suite simplification) deferred until Phase A closes. User flagged that @pytest.mark.slow would mis-categorise tests/stochastic/ (genuinely simulation-heavy by nature, not "live mesh" slow). When Track 2 reopens, the right marker name is @pytest.mark.live_mesh (precise — tests that spin up the FastAPI mesh, in-process or multi-process), NOT @pytest.mark.slow — same axis-naming rule as the multiprocesses rename: the marker should name the failure-mode dimension, not a coarse speed bucket.
Pickup at next session: Stage A0 (identifier sweep) → Stage A1 spike (Windows spawn for FastAPI app-factory, validated against the calibration vernier first) → G3 sign-off before touching the 13-service experiment mesh. Decision-log in notes/soa-refactor.md Stage A0; live state in memory/project_soa_refactor_planned_2026_05_03.md.
Eight-stage refactor reshaping src/experiment/ so the layering reads top-down: architecture.py (server) + users.py (client) compose into executor.py (cell driver), with wire/ (URL + payload concerns) and runtime/ (OS-boundary helpers) sitting underneath. Plan, status table, and per-stage notes in notes/prototype-v2.md.
Layout shift
src/experiment/
├── __init__.py # marker only; documents the two top-level ctxmgrs
├── architecture.py # TasArchitecture (server-side ctxmgr)
├── users.py # TasUser (client-side ctxmgr) — NEW
├── executor.py # execute_one + execute_sweep + build_svc_df_from_logs
├── client/ # client-side load-generator package (records / config / guard / sender / driver / stats / simulator)
├── instances/ # tas / third_party / common
├── services/ # atomic / composite / vernier / base / instruments
├── wire/ # NEW
│ ├── payload.py # generate_payload, resolve_size_for_kind
│ └── registry.py # SvcRegistry
└── runtime/ # NEW
├── async_loop.py # run_async_safe
├── os_timer.py # windows_timer_resolution
└── uvicorn_thread.py # UvicornThread
scanner.py and runner.py are gone. payload.py + registry.py now live under wire/. uvicorn_thread.py joined os_timer.py + async_loop.py under runtime/. users.py is new — the synthetic-user side of the prototype, deliberately decoupled from architecture.py (the executor pairs them). executor.py absorbed the scanner sweep + helpers + build_svc_df_from_logs. methods/experiment.py imports DOWN from executor.py directly; the scanner shim is deleted.
Stage outcomes (full per-stage table in notes/prototype-v2.md; final pytest is 387 passed at every multi-stage stop-gate):
| Stage | Action | Stop-gate |
|---|---|---|
| 0 | Baseline pytest + consumer inventory | 364 passed in 8:49 |
| 1 | wire/ (payload.py + registry.py) |
295 passed in 4:34 |
| 2 | runtime/ (async_loop.py + os_timer.py extracted from executor.py; uvicorn_thread.py moved in) |
16 runtime + 302 broader |
| 3 | users.py with TasUser ctxmgr (decoupled from architecture) |
260 passed in 32:84 |
| 3.5 | architecture.py + test_architecture.py alignment with the wire/runtime/users refactor | 258 passed |
| 4 | scanner.py absorbed into executor.py; quarantine shim left in place |
387 passed in 8:36 |
| 5 | Verify methods/experiment.py imports DOWN through the shim |
(no code change) |
| 6 | Switch consumers to NEW import paths (executor + runtime, no scanner) | 387 passed in 9:25 |
| 7 | Delete scanner.py shim + clean __init__.py historical note |
387 passed in 8:51 |
| 8 | Devlog + memory entries (this entry) | (docs only) |
Patterns crystallised during the reshuffle (all pinned in .claude/skills/develop/coding-conventions.md + memory):
- One test class per source module.
TestInit/TestActiveFlag/TestWaitReadyTimeoutcollapsed to singleTestUvicornThread; same forTestTasArchitecture(was 4 classes),TestExecutor(was 3 classes). Class context plus prefix-disambiguated test names (test_resolve_rates_*,test_execute_one_*,test_sweep_*) carry the topic. __aexit__underscore-prefix unused-args. Every async ctxmgr in the package signsasync def __aexit__(self, _exc_type, _exc, _tb) -> None:with a docstring paragraph explaining the protocol-required-but-unused contract.- Decompose long
__aenter__into named_step_x()helpers.TasArchitecture.__aenter__shrank from a 100-line block of# step 1/# step 2runs to a 6-line table of contents calling_gate_deployment/_init_registry_and_specs/_resolve_entry_router/_init_routed_client/_mount_apps. while active:overwhile True: break.UvicornThread.wait_readyrefactored to use a boolean instance flag with inline raise on the failure path;shutdown()clears the flag to release a concurrent poll early.- Sibling ctxmgrs stay constructor-independent; the executor pairs them.
TasUserdoes NOT importTasArchitecture;executor.execute_oneis the only place that constructs both. LetsTasUserbe driven against any compatible transport. - No
assertinsrc/modules.assertgets stripped underpython -O; production code uses explicitif cond: raise SomeError(_msg)for invariants. Pyright narrows after the raise. - Behavioural tests over no-raise tests.
test_os_timer.pyrewrote 4 weak no-raise tests into 7 behavioural tests usingunittest.mock.patchto verifywinmm.timeBeginPeriod/timeEndPeriodare actually called in order, with the right period, and thattimeEndPeriodruns even when the body raises. - Build_svc_df_from_logs stays at the building-block layer. Original Stage-5 plan was to lift it UP into
methods/experiment.py. Revised because BOTHexecute_sweepandmethods/experiment.py::runconsume it; placing it atexecutor.pylets both import DOWN. The Stage-5 step degenerated to a verification.
Files added / removed
- Added:
src/experiment/users.py(95 lines),src/experiment/wire/{__init__,payload,registry}.py,src/experiment/runtime/{__init__,async_loop,os_timer,uvicorn_thread}.py,tests/experiment/test_users.py,tests/experiment/wire/test_{payload,registry}.py,tests/experiment/runtime/test_{async_loop,os_timer,uvicorn_thread}.py. - Removed:
src/experiment/{payload,registry,scanner,uvicorn_thread,runner}.py,tests/experiment/test_scanner.py, the inlineClientSimulator-construction paths intest_architecture.py(migrated toTasUser).
src/experiment/client.py (~595 lines) was doing config + record + cascade detector + request sender + rate driver + stats + ramp orchestrator in one file, with the middle five collapsed into ClientSimulator. Split along responsibility lines while keeping a *__OLD__.py reference module + barrel shim for the deprecation window so consumers stay green at every step.
New layout (src/experiment/client/):
records.py->RequestRecord(renamed fromInvocationRecord).config.py->CascadeCfg/RampCfg/ClientCfg.guard.py->StopGuard(renamed from inline cascade detector).sender.py->RequestSender(client, registry, cfg, rng).send_one(kind).driver.py->RateDriver(sender, guard, ramp_cfg, kind_names, kind_prob_norm, rng).run(rate)— absolute-deadline batch loop.stats.py->compute_probe_stats(records, counts, duration_s, rate, stop_reason, kind_names)(pure function).simulator.py-> leanClientSimulatorcomposing sender + guard + driver; walks the rate schedule.__init__.pybarrel re-exports the public API plus deprecation aliases (InvocationRecord = RequestRecord,validate_ramp/build_ramp_cfg->src.io.load_ramp_cfg).
JSON loader moved to src/io/tooling.py (parity with load_method_cfg):
load_ramp_cfg(ramp)->RampCfg(validates first).load_client_cfg(method_cfg, *, kind_prob)-> fullClientCfg.- Both re-exported from
src/io/__init__.py.
Quarantine pattern (per-stage safety): renamed the old module client.py -> client/client__OLD__.py and the old test file test_client.py -> tests/experiment/client/test_client__OLD__.py. The barrel pointed at OLD initially, then switched to NEW once every submodule + test landed. Old test file repointed its imports at client__OLD__.py directly so it kept testing OLD throughout the migration. Both *__OLD__.py files will be deleted in a follow-up commit once the deprecation window ends.
Naming choices:
cascade.py/CascadeDetector->guard.py/StopGuard(less metaphorical; "guard that says stop here").rate_driver.py/RateDriver->driver.py/RateDriver(avoids name collision withservices/instruments.py::LogProbe; the client side does NOT use AOP since we own both the call site and the response handling).RequestRecordper project acronym convention (req over invocation).
Architectural separation server-side vs client-side:
services/instruments.py(@logger,LogProbe) wraps FastAPI handler__call__— needed because FastAPI owns the call site.client/driver.py+client/sender.pyown their own loop + return value; no decorator needed. The asymmetry is intentional.
Iterative review pass over seven src+tests pairs, applying coding-conventions.md + code-documentation.md skills.
Modules touched (1:1 src ↔ tests):
src/dimensional/sensitivity.py+tests/dimensional/test_sensitivity.pysrc/methods/dimensional.py+tests/methods/test_dimensional.pysrc/io/tooling.py+tests/io/test_tooling.pysrc/io/config.py+tests/io/test_config.pysrc/stochastic/simulation.py+tests/stochastic/test_simulation.pysrc/methods/stochastic.py+tests/methods/test_stochastic.pytests/conftest.py
Result: 71 dimensional tests, 17 io/config tests, 10 io/tooling tests, 9 stochastic-engine tests, 22 methods/dimensional tests, 10 methods/stochastic tests — all green.
- Module docstring
*IMPORTANT:*framing demoted to prose ahead of the public-API list. The*IMPORTANT:*marker became visual noise once every module carried one; readers scanned past it. - Trivial
# ...labels dropped (# build the nodes,# run the engine,# header block,# unpack the cfg into per-node arrays). Informative why-lines kept and rewritten when the original described WHAT instead of WHY. raise X(_msg)extraction pattern applied consistently: compute_msg = f"..."on its own line, thenraise ValueError(_msg). Long f-strings insideraiseare hard to scan.- Filtering list comprehensions decomposed to explicit
for/if/appendloops insrc/stochastic/simulation.py::_summarise_replication(3 of them) andsrc/methods/stochastic.py::solve_net(the per-artifact_mu/_c/_Kbuild). Simple single-purpose comprehensions kept. - Test conventions tightened across all 6 test modules:
*IMPORTANT:*framing demoted in module docstrings.- "verifies that" / "verifies" framing dropped from class docstrings.
- Test docstring lead-ins concretised to literal code-level claims (e.g.
len(_a["coefficients"]) == 4,format_model_string(1, 10) == "M/M/1/10"). -> Noneadded to every test method.- Fixture parameters typed (
pytest.FixtureRequest,pytest.MonkeyPatch,Dict[str, Any]). - Test names tightened to short topic+outcome with acronyms; "when" filler dropped, prepositions preferred (
on_/without_/from_); formula-form where appropriate (test_theta_partial_L_positive,test_W_net_close_to_analytic).
- Conftest fixtures fully typed.
tests/conftest.pynow declaresSchema,AnalysisEngine, andTuple[AnalysisEngine, Dict[str, Any]]returns; pyright'sreportUnusedFunctionfalse positives on pytest fixture-by-name injection silenced where the IDE flags them.
- Avoid dense / chained list comprehensions. Simple
[_p.name for _p in paths]is fine. Filter+transform+nested-call combos and stacked / nested comprehensions decompose to explicit loops. Rule of thumb: non-trivial filter AND non-trivial expression → explicit loop. The user flagged this directly: "when you condense many commands in a list comprehension or multiple list comprehensions it's difficult to read; this means it's a programming antipattern." - No
dict(...)for kwarg packing across multiple call sites. Pyright widens every value to the union of all values, so_args = dict(mu=[10.0], lam_z=[5.0], K=[None], reps=2); fn(**_args)types every parameter aslist[float] | list[None] | int, breaking type-check at the call boundary. Either inline the kwargs at each call site, or define a typed module-level helper. Surfaced when refactoringtests/stochastic/test_simulation.py::test_same_seed_same_summary. - Test helpers move to module scope, not nested in test bodies. A
def _helper(): ...inside a test method is a lazy definition that other tests can't reuse and that type-checkers can't see clearly. The user flagged the nested-helper case as "lazy definition, move outside" and the fix landed_run_single_node(*, lam_z, K, horizon, warmup, reps, seed=42)at module level so all fivesimulate_net(...)blocks acrossTestMM1Convergence/TestSeededReproducibility/TestBlockingBoundaryreduce to 5-line kwarg calls.
Memory entries refreshed: feedback_no_filtering_list_comps.md rewritten with the density rule and explicit ✅/❌ examples (the original framing as "no filtering comprehensions" was too broad — the user clarified that simple ones are fine).
M CLAUDE.md
M .claude/skills/develop/coding-conventions.md
M notes/devlog.md (this entry)
M src/dimensional/sensitivity.py
M src/methods/dimensional.py
M src/io/tooling.py
M src/io/config.py
M src/stochastic/simulation.py
M src/methods/stochastic.py
M tests/conftest.py
M tests/dimensional/test_sensitivity.py
M tests/methods/test_dimensional.py
M tests/io/test_tooling.py
M tests/io/test_config.py
M tests/stochastic/test_simulation.py
M tests/methods/test_stochastic.py
M ~/.claude/.../memory/MEMORY.md
M ~/.claude/.../memory/feedback_no_filtering_list_comps.md
2026-05-01 (evening) — Layering fix: runner.py extraction breaks the experiment-architecture inversion
Final pass of the day, triggered by auditing src/experiment/architecture.py against the coding-conventions skill. Surfaced a dependency inversion that the previous lazy-import-in-function pattern had been masking: src/experiment/architecture.py::sweep_arch_exp was lazy-importing _run_async, _run_async_safe, _build_svc_df_from_logs from src/methods/experiment.py. src/experiment/ is the building-block layer, src/methods/<x>.py is the orchestrator layer; the arrow was pointing UP.
- New module
src/experiment/runner.py(mesh-runner + log-postprocessing layer) withrun_async,run_async_safe,build_svc_df_from_logs,windows_timer_resolutionas public helpers. Bodies lifted verbatim fromsrc/methods/experiment.py. src/methods/experiment.py::run()imports the three helpers fromsrc.experiment.runnerinstead of defining them. File shrank 786 → 460 lines.src/experiment/architecture.py::sweep_arch_expimports fromsrc.experiment.runnerat module top (no more lazy-import-in-function).- Dropped the dead
from src.experiment.architecture import sweep_arch_expre-export insrc/experiment/__init__.py(verified by grep that nothing imported via the package barrel — only direct module path). This was the original reason the cycle existed: loading anyexperiment/sibling pulled inarchitecturetransitively, which then neededmethods/experiment.py.
After the extraction, both arrows point DOWN:
src/methods/experiment.py (orchestrator: replicate loop + envelope writing)
└─> src/experiment/runner.py (building block)
├─> src/experiment/launcher.py
├─> src/experiment/client.py
└─> src/analytic/jackson.build_rho_grid
src/experiment/architecture.py (building block: configuration sweep)
└─> src/experiment/runner.py (same)
The cycle disappeared because the broken arrow was gone, not because Python's import machinery was tricked.
Added a new bullet to CLAUDE.md and a longer version to .claude/skills/develop/coding-conventions.md:
Layering: arrows point DOWN.
src/experiment/is the building-block layer;src/methods/<x>.pyis the orchestrator layer that uses those building blocks. A building-block module may NOT import from an orchestrator. If a building-block needs an orchestrator helper, the helper is misplaced and should be moved DOWN. Lazy-importing-in-function is a smell that preserves the inverted arrow, not a fix.
Companion rule in the skill:
Dead package re-exports hide layering bugs. Before reaching for a lazy import, grep for actual consumers of every name in
__init__.py's__all__. A re-export that no one imports through the barrel is dead code AND a transitive-load trap.
- Inline f-string raises extracted to the
_msgpattern: 3 inclient.py::validate_ramp, 1 inlauncher.py::get_lam_z_entry, 1 inpayload.py::generate_payload, 1 inuvicorn_thread.py::wait_ready. - Two broad
except Exception:narrowed inclient.pyto(httpx.HTTPError, ConnectionError, OSError, asyncio.TimeoutError, ValueError)and(..., RuntimeError)for the task-drain path. - Five stacked-
#runs collapsed inclient.py(R16: 9-line auto-batched-send rationale, 5-line Windows time.sleep recipe, 4-line batch-send block, 2-line drain budget, 6-line effective-rate explanation). Substantive content moved to one-line why-statements; long form preserved innotes/calibration.mdand.claude/skills/develop/async-rate-precision.md. - base.py docstring concretion (file the user said was "already done" — skill audit found gaps anyway): 2 typo fixes ("cheks" → "check", "inf_flight" → "in_flight"); 15 docstring lead-ins normalised to verb-first
*name()*/**Name**form (every public symbol). - registry.py SvcRegistry: dropped redundant
Attributes:block (project convention is inline# whycomments above each field, not a separate Attributes section). - architecture.py docstring polish: 3 manual-wrapped sites unwrapped; units added to numeric Args (
mu (float, req/s),c_int (int, server count),K_int (int, buffer capacity),mu_factor (float, unitless)). - launcher.py: added
*_is_entry_router()*docstring. - test_uvicorn_thread.py created (was missing per "tests mirror src/ 1:1" rule); 3 unit tests for constructor + custom-host + timeout-raises. Lifecycle integration test deliberately omitted because pytest-asyncio's global
asyncio.runpatch lacks theloop_factorykwarg uvicorn passes on Python 3.12 — the lifecycle test passed alone but failed in the full suite. Full lifecycle is exercised throughtest_launcher.pyinstead. - 35 long test method names trimmed across 5 test files (e.g.,
test_empty_kind_weights_rejected_at_simulator_construction→test_empty_weights_rejected;test_above_threshold_trip_only_after_window_fills→test_trip_after_window_fills). Detailed contracts moved into*test_name()*docstring lead-ins. -> Nonereturns added to every test method via a regex pass: 65+ test methods across the five test files.- Two stacked-
#runs collapsed intest_launcher.py.
End of pass: full tests/experiment/ suite at 209 passed; no stale identifiers anywhere; layering arrows all point DOWN.
Second-day pass that drove the morning's refactor outward. Code-level outcome: 86 service tests + 11 instance tests = 97 green; no external_forward / kind_to_target / parse_tas_idx / mark_admit_time / mark_local_end identifiers remain anywhere outside notes prose and .gitignore.
| Symbol | Before | After | Cascade |
|---|---|---|---|
kind_to_target |
composite param + KindPick field + tas.py param + demo kwargs | kind_to_tgt |
instances/tas.py, tests/experiment/instances/test_tas.py (positional, no edit), src/scripts/demo_services.py (kwarg + comment), launcher.py left alone (passes positionally) |
_default_route_for |
composite top-level | _build_route |
composite-internal only |
parse_tas_idx |
composite top-level public | _parse_constituent_idx (private) |
tests/experiment/services/test_composite.py import + TestParseTasIdx class renamed TestParseConstituentIdx |
external_forward |
instances/tas.py::build_tas and instances/third_party.py::build_third_party params |
ext_fwd |
3 kwarg sites in tests/experiment/instances/test_third_party.py, 3 kwarg + 1 prose comment + 1 print label in src/scripts/demo_services.py, 3 kwarg sites in src/scripts/demo_third_party.py; launcher.py untouched (positional) |
Every test file under tests/experiment/services/ and tests/experiment/instances/ is now uniform:
from tests.utils.helpers import _SpecBuilderimport; local@pytest.fixture def specs() -> _SpecBuilderwrapper; test signatures(self, specs: _SpecBuilder) -> None._recorded_forward(calls)closure factory replaced byclass _RecordedForwardcallable with explicitself.callsfield._noop_forward/_raising_forwardstay as top-level async functions (stateless).- Pytest fixture return types annotated (
Tuple[FastAPI, SvcSpec],Tuple[FastAPI, Dict[str, SvcSpec], List[Tuple[str, str]]], etc.). - Test method names trimmed to short topic+outcome form (
test_returns_503_when_in_flight_exceeds_K→test_503_at_K_capacity;test_request_flows_through_three_tas_components→test_three_hop_chain; etc.). Long-form contract narrative moved into*test_name()*docstring lead-ins. - Test class renames:
TestExternalForwardOnlyToThirdParty→TestExternalForward;TestParseTasIdx→TestParseConstituentIdx. - Chained comparisons (
a <= b <= c) decomposed into_con_1 = a <= b; _con_2 = b <= c; assert _con_1 and _con_2.
The most load-bearing finding of the day. Two regimes by scope:
High-level scope (class / module / public function): natural-language descriptions of what the abstraction does and why a reader cares. Do not recite method calls. The audience is someone reading the call site, not someone reading the body.
- Bad:
*AtomicHandler* run a fixed sequence per request: try the K admission counter, acquire one of \spec.c` permits, sleep `ctx.draw_svc_time()` seconds, draw a Bernoulli at rate `spec.epsilon`...` - Good:
*AtomicHandler* simulate one service node of the queueing network. Each call enforces capacity limits, waits a service-time draw, may fail on the configured Bernoulli, and either terminates or hands the request off to a downstream node.
Narrow scope (test methods, private helpers): literal code-level claims. The docstring matches what pytest will print on failure, so a debugger reading the docstring sees the same expression they'd be debugging.
- Bad:
every composite member gets a distinct SvcCtx exposed on app.state.tas_components. - Good:
\set(app.state.tas_components.keys()) == set(specs.keys())` and `len({id(s) for s in tas_components.values()}) == len(specs)`.`
Other concretion rules applied uniformly:
- No circular self-reference. A class named
CompositeDispatchdoes not say "callabledispatchthat..."; a function named_jackson_pickdoes not say "defaultpick_tgt: Jackson-weighted choice"; describe what the thing does in the reader's terms. - Drop programmer jargon: "delegate to" → "call"; "stash" → "store"; "land a failure row" → "append a failure row"; "scratchpad" → "log". Plainer verbs travel further.
raise SomeError(_msg)with the message extracted first:_msg = f"..."on its own line, thenraise ValueError(_msg). Long f-strings inside theraiseare hard to scan.
composite.py::_parse_constituent_idx had _mdg = f"not a TAS component name: {name!r}"; raise ValueError() — message built into a local but never passed. Fixed to _msg = f"..."; raise ValueError(_msg) (the canonical pattern this sweep also codified). Test test_non_tas_name_raises was matching the exception type only, so the bug was silent.
Swept src/experiment/services/ end-to-end. Key shift: @logger no longer reads from module-level ContextVars; per-invocation state flows through an explicit LogProbe dataclass the decorator threads as the third arg of the wrapped method. Atomic / vernier / composite were migrated to callable-class handlers (no nested def-in-mount-fn). Param renames applied (external_forward → ext_fwd, pick_target → pick_tgt, kind_to_target → kind_to_tgt, target → tgt inside CompositeDispatch and the atomic default dispatch). Test helpers consolidated under tests/utils/helpers.py::_SpecBuilder; closure-factory forwards (_recorded_forward) replaced by callable classes; test method names trimmed and contract narrative moved into docstring lead-ins. 86/86 service tests green at end of pass.
| File | Change |
|---|---|
src/experiment/services/instruments.py |
mark_admit_time / mark_local_end deleted. New: LogProbe dataclass, stamp_admit() -> int, stamp_local_end() -> int. logger(func) wraps (self, req, probe) and exposes (self, req) to FastAPI (no __wrapped__). |
src/experiment/services/atomic.py |
_AtomicHandler → AtomicHandler (callable class with @logger on __call__). Defaults _jackson_pick(self) -> Optional[str] (req param dropped — was unused) and _external_dispatch(self, tgt, req) are methods on the class. mount_atomic_svc instantiates and registers. |
src/experiment/services/vernier.py |
_VernierHandler → VernierHandler. __call__(self, req, probe). Uses probe.admit_ts = stamp_admit(); probe.c_used_at_start = self.ctx.c_in_use. |
src/experiment/services/composite.py |
_CompositeDispatch → CompositeDispatch, _KindPick → KindPick. Param renames in mount_composite_svc. parse_tas_idx → _parse_constituent_idx (now private). |
src/experiment/instances/tas.py |
Kwargs external_forward= → ext_fwd= and kind_to_target= → kind_to_tgt= at the mount_composite_svc call site. build_tas's own param name unchanged. |
src/experiment/services/__init__.py |
mount_vernier_svc re-enabled (was temporarily disabled while vernier still used the old API). |
tests/utils/helpers.py |
New _SpecBuilder callable class with kwargs-only __call__ returning SvcSpec. Imported by every service test file. |
tests/experiment/services/test_atomic.py |
_RecordedForward callable class replaces the closure-factory _recorded_forward. _noop_forward and _raising_forward promoted to module-level top-level async functions (no nested def inside the test). All test methods carry (self, specs: _SpecBuilder) -> None. Test names shortened (test_terminal_returns_success_and_logs_one_row → test_terminal_success_row, etc.). |
tests/experiment/services/test_instruments.py |
Rewritten to test the method-form @logger via a _LoggedProbe test class with __call__(self, req, probe). New TestStampHelpers covers stamp_admit / stamp_local_end monotonicity. |
tests/experiment/services/test_composite.py |
Same patterns: _RecordedForward class, specs fixture from helpers, typed signatures, TestParseConstituentIdx class renamed to match the now-private function. |
ContextVars carried per-task timestamps via module-level globals; the decorator wrote set(None) before each call and get() after. Worked, but the data flow was hidden — mark_admit_time returned None and side-effected into a global. The probe makes the channel explicit: the decorator creates one, threads it, reads its fields. Same per-task isolation (probe is local to the wrapper), no ContextVar coupling, no global state. Trade-off: __call__ signature is (self, req, probe) instead of (self, req); FastAPI sees the wrapper's 2-arg signature, which is why we don't set __wrapped__ (otherwise inspect.signature(callable, follow_wrapped=True) walks back to the inner 3-arg method and FastAPI tries to bind probe from the request body).
mount_atomic_svc previously held a nested @logger(_ctx) async def _handler(req): ... plus two more nested def pick_target(...): ... / def dispatch(...): ... for defaults. Three closures over the mount-fn's parameters; the inner pick_target shadowed the outer parameter of the same name. Replaced by callable classes with explicit fields; mount-fn shrinks to instantiate + register. Same call ergonomics from the FastAPI side (callable instance is a callable; signature inspection on a class instance reads __call__ minus self).
- Stateless helpers as plain functions, stateful as callable classes.
_jackson_pick(self) -> Optional[str]is stateless beyondself;RecordedForward(calls)is stateful. Don't add a class wrapper around something that has only__call__and no__init__work — that's a function in disguise. - Test helpers go in
tests/utils/helpers.py, notconftest.py. Tests import what they need. Each test file may still wrap an imported callable class in a local pytest fixture (def specs() -> _SpecBuilder: return _SpecBuilder()); fixtures stay test-file-local, the class is shared. - Test method names: short topic + outcome.
test_503_at_K_capacity, nottest_returns_503_when_in_flight_exceeds_K. The*test_name()*docstring lead-in carries the long-form contract. - Decompose chained comparisons.
assert a <= b <= cis the same compact-multi-condition form the no-inline-ternary rule targets; split into_con_1 = a <= b; _con_2 = b <= c; assert _con_1 and _con_2. - Acronyms-everywhere policy still applies. When a parameter name shadows English prose used in surrounding comments / docstrings (
external_forward,pick_target), rename to the acronym form (ext_fwd,pick_tgt) so grep separates code from prose.
2026-04-30 (later) — Notes consolidation: proof + experiment + InfoQ -> procedure.md + new MVA skill
Three notes/ files merged or relocated to bring the methodology / case-study split into clean alignment per the experimental-design skill's authority chain.
| Action | Path | Reason |
|---|---|---|
| DELETED | notes/proof.md |
content absorbed into notes/procedure.md |
| DELETED | notes/experiment.md |
content absorbed into notes/procedure.md |
| MOVED | notes/architecture_experimentation.md -> assets/docs/architecture_experimentation.md |
full InfoQ summary belongs in assets/ (matches the precedent set by assets/docs/operational_analysis.md); not project-specific content |
| CREATED | notes/procedure.md |
CS-01 instantiation of the four-piece experimental-design discipline (hypothesis -> model -> prototype -> validation); single document for both the falsifiable claims AND the procedure that tests them |
| CREATED | .claude/skills/design/mva-framework.md |
NEW skill seeded from the InfoQ summary; architectural-experiment subset of experimental design (distinct from experimental-design.md's authoritative four-piece flow) |
.claude/skills/design/experimental-design.md (authoritative — four-piece methodology)
complemented by
.claude/skills/design/mva-framework.md (distinct subset — MVA framing per Pureur & Bittner)
complemented by
.claude/skills/develop/architectural-experiments.md (prototype-side discipline)
instantiated by
notes/procedure.md (CS-01)
referenced from
notes/prototype.md, notes/comparison.md, notes/calibration.md
On any conflict between procedure.md and a skill, defer to the skill.
The previous notes/ layout mixed two layers:
| Layer | Purpose | Lives in |
|---|---|---|
| Methodology | project-agnostic experimental-design discipline; reusable across CS-1, CS-2, future cases | .claude/skills/design/ + assets/docs/ |
| Case-study content | CS-01 hypotheses, procedure, prototype, validation plan | notes/ |
architecture_experimentation.md was methodology (Layer A) misfiled in Layer B's directory. proof.md and experiment.md were two views of the same Layer-B content (hypothesis vs. procedure for CS-01) and merging them removes a redundant cross-reference axis. The new MVA skill captures the InfoQ-derived framing so future case studies can apply it without copying content.
project_proof_framework_2026_04_30.md— file-path reference updated fromnotes/proof.mdtonotes/procedure.mdMEMORY.mdindex — same path update
.claude/skills/design/experimental-design.mdand.claude/skills/develop/architectural-experiments.md— to add a one-line cross-reference tomva-framework.mdso the authoritative chain is explicit (deferred; non-blocking)
Was 14 files / ~430 KB; now 12 files / ~330 KB. No load-bearing content lost; cross-references collapsed; methodology / case-study boundary respected.
| Reference | Where | Fix |
|---|---|---|
notes/proof.md |
memory entries (already updated), some skill cross-refs | search-replace to notes/procedure.md |
notes/experiment.md |
several notes/*.md cross-references; CLAUDE.md |
search-replace to notes/procedure.md |
notes/architecture_experimentation.md |
notes/proof.md (now deleted), some memory entries |
search-replace to assets/docs/architecture_experimentation.md (or to the new .claude/skills/design/mva-framework.md for skill-style references) |
These are find-and-replace fixes. User explicitly said link-breakage repair is the least difficult item and not a blocker.
Articulated the dissertation-grade proof structure in notes/proof.md. Two independent falsifiable axes:
| Axis | Hypothesis | Falsifier |
|---|---|---|
| H1 predictive | DASA's dimensional viable region on the Yoly chart bounds prototype configurations satisfying R1∧R2∧R3 | Predicted-viable config fails Cámara; predicted-infeasible passes |
| H2 congruent | The four methods (analytic, stochastic, dimensional, experimental) agree within DASA-side tolerance for every (c, K, μ, λ) | Any pairwise residual exceeds tolerance |
Two-stage structure with completely different tolerance semantics:
- Stage 1 — calibration gate (≤ 5 % noise floor): precondition for experimentation, NOT a hypothesis-test tolerance. Captures irreducible host noise outside the model's abstraction. Already implemented (envelope's
baselineblock stamped on every experiment result). - Stage 2 — real experiments at DASA-side tolerance: tests H1 + H2 against the model's own approximation budget (Markovian assumption, 2nd-order ignored effects, MC variance). NOT against host noise.
User correction on the framing — three things were initially conflated and are now separately captured:
- Calibration error vs model error: I had pinned
±5 %as a hypothesis tolerance. It is a precondition gate, not a tolerance. Memory:feedback_calibration_vs_model_error.md. data/config/(input) vsdata/results/(output): I labelled result JSONs as "configs" in proof prose. They are run outputs of<method>.run(). Memory:feedback_data_paths_input_vs_output.md.- Tests / functional replication ≠ experiments: Software-architecture community routinely calls unit / functional tests "experiments" because all three involve running code and comparing output. Distinguishing question: what would falsify the activity? Cámara 6-decimal replication is a unit test of the analytic solver, not validation of DASA's predictive claim. Memory:
feedback_test_vs_experiment_distinction.md.
.claude/skills/design/experimental-design.md: added "Calibration is a precondition gate, NOT a hypothesis tolerance" subsection; added "Tests / functional replication are NOT experiments" subsection; new anti-patterns (calibration-as-tolerance, replication-as-validation, config/results path inversion)..claude/skills/develop/architectural-experiments.md: extended Principle #1 with replication-≠-validation paragraph; new anti-patterns + reviewer checklist items for calibration gate and config/results separation.
notes/proof.md(NEW) — formal proof structure with two-stage tolerance disciplinenotes/architecture_experimentation.md(existing) — InfoQ MVA piece referencenotes/devlog.md— this entry.claude/skills/design/experimental-design.md— three new subsections.claude/skills/develop/architectural-experiments.md— replication-≠-validation paragraph + 2 new anti-patterns + 2 new checklist itemsmemory/MEMORY.md(index) — 4 new entries indexed at topmemory/project_proof_framework_2026_04_30.md(NEW)memory/feedback_calibration_vs_model_error.md(NEW)memory/feedback_test_vs_experiment_distinction.md(NEW)memory/feedback_data_paths_input_vs_output.md(NEW)
- Articulate the model's approximation budget → DASA-side tolerance numerical
- Build method 5 (
comparison.py) — currently a skeleton innotes/comparison.md - Define hypothesis-set operating points formally (validity envelope: ρ < 1, finite K, Markovian)
- Extend
plot_yoly_chartwith viable-region shading - Define DASA viable-region predicate from R1/R2/R3
Driven by the "how do I get μ=1600 req/s on this host?" question. The host's per-worker μ ceiling on DESKTOP-INKGBK6 (Windows + uvicorn TCP loopback) is ~290 req/s — Cámara canonical artifacts (AS_{3}.μ=1580) cannot be served by a single physical worker. Today landed five interlocking changes.
run_rate_sweep (src/methods/calibration.py) was driving the full TAS mesh per trial — 13 services up + down + cascade-detection on the experiment.run envelope. Rewrote it to drive the standalone ping/echo vernier: one server reused across all rates × trials, achieved rate = samples / window_s. Loss = (target - achieved) / target × 100. Decoupled from any TAS profile, no entry_service coupling, no experiment.run recursion.
Dropped: adaptation, min_samples, cascade_*, entry_service, with_lambda_z kwargs + _read_lambda_z_at, _run_single_rate_probe, _summarise_rate_trial, _print_rate_trial_row helpers + matching CLI flags. Trimmed data/config/method/calibration.json::rate_sweep to just {rates, trials_per_rate, max_probe_window_s, target_loss_pct}. Test rewritten to monkey-patch the new _run_rate_sweep_async orchestrator. 24/24 tests pass in 90 s.
Notebook section 7 markdown updated: "Drives the standalone vernier ping/echo service ... Pure host-transport saturation, decoupled from the TAS profile (full-mesh saturation testing belongs in the experiment notebook itself)." Section 6b retitled "Single-worker push-back card (closed-loop)"; section 7b "Multi-worker rate-driven sweep (open-loop)". "Route B" jargon stripped (internal-only term).
The artifacts vs specs split (per notes/qn_config_conventions.md) lets artifacts.json::* stay frozen at Cámara values while specs.json::* carries the deployable knobs. Today's recipe applied to both dflt.json::specs and opti.json::specs:
| Artifact | artifacts μ | specs (c · μ) | aggregate | headroom |
|---|---|---|---|---|
| TAS_{1..6} | 700 | 4 · 250 | 1000 | +43% |
| MAS_{1} | 180 | 1 · 180 | 180 | host can deliver |
| MAS_{2} | 530 | 3 · 250 | 750 | +42% |
| MAS_{3} | 150 | 1 · 150 | 150 | host can deliver |
| AS_{1} | 700 | 3 · 250 | 750 | +7% (tight) |
| AS_{2} | 410 | 2 · 250 | 500 | +22% |
| AS_{3} | 1580 | 8 · 250 | 2000 | +27% |
| DS_{3} | 550 | 3 · 250 | 750 | +36% |
| MAS_{4} (opti) | 880 | 4 · 250 | 1000 | +14% |
| AS_{4} (opti) | 210 | 1 · 210 | 210 | host can deliver |
| DS_{1} (opti) | 250 | 2 · 250 | 500 | +100% margin |
All K=10. Drift between artifacts.c=1, μ=1580 (analytic / stochastic / dimensional predictions) and specs.c=8, μ=250 (experiment delivery) is the dimensional case-study finding — η = X·K/(μ·c) shifts ~10× because c·μ is held but per-server μ drops 6×. That drift IS the story, not a bug.
UDS transport upgrade (Path C — μ ≈ 2000 per worker on Linux UDS) deferred to remote-distribution stage; full plan in notes/distribute.md::Section 12 and project_uds_transport_deferred.md memory.
Killing a sweep mid-flight (Ctrl-C, kernel crash, nbconvert timeout) was leaving uvicorn workers orphaned. Three layers of defense:
- Per-combo cleanup:
_drive_one_comboalready wrapped intry / finally _server.shutdown(). Normal flow. - atexit hook (NEW): module-level
_ACTIVE_VERNIERS: weakref.WeakSet[_UvicornThread]tracks every vernier started via_register_vernier(_UvicornThread(_app, port)).atexit.register(_shutdown_active_verniers)walks the registry on graceful interpreter exit. Catches the case where a sweep crash bypasses the per-combofinally. daemon=TrueonUvicornThread(src/experiment/uvicorn_thread.py:56): ensures the parent process can always exit even ifshutdown()deadlocks orjoin(timeout=5.0)exceeds.
What's still outside our control: taskkill /F (SIGKILL-equivalent on Windows) bypasses atexit. The kernel still owes a 30 s TIME_WAIT cooldown per closed TCP connection — port 8765 stays in TIME_WAIT for ~30 s after a hard kill before re-bindable. That's the irreducible "leak window".
run_calib_sweep was passing args.uvicorn_backlog (16384 default) to derive_calib_coefs per combo, leaking host-default K into every combo card. Each combo's K array became [16384] * lambda_steps regardless of sweep_grid.K = [16, 32, 64, 128]. Fixed by passing _K_val (combo's K) and K_values=[int(_K_val)] explicitly. Smoke-test confirmed K_array == [combo_K] * lambda_steps post-fix.
plot_calib_rate_sweep: now draws BOTH+target_loss_pctand-target_loss_pcthorizontal bars (was only+target). Annotations:+2.5%/-2.5%. Pairs with theabs()check in_find_highest_sustainable_rateso the visualisation matches the calibrated-rate pass-band semantics.plot_yoly_chartauto-scaling footer:footer_h = max(0.18, 0.04 + ceil(N/4) × 0.018)where N islen(scenarios)/len(paths)(grouped mode) or_estimate_single_mode_count(coeff_data) = unique(c) × unique(μ)(single mode). New helper_estimate_single_mode_countlooks up the firstc_*and\mu_*arrays via prefix-match. Architecture yoly (16 entries) keepsfooter_h=0.18; calibration sweep (48 entries) grows tofooter_h=0.256. Without this, the 48-entry legend overflowed the body.- Calibration sweep label includes μ: cell
nb-calib-sweep-plotwas building scenario labels asf"c={c} K={K}"— collapsing 4 mu_factor variants per (c, K) into one dict entry (only 12 of 48 combos visible). Fixed: label now includesμ(bold-math) read frommeta.mu_req_per_s. 48 unique scenarios now render.
src/methods/calibration.py— full rate-sweep rewrite, atexit cleanup, K-fix, helper deletionssrc/experiment/uvicorn_thread.py—daemon=True(already present, confirmed)src/view/charter.py— auto-scaling footer inplot_yoly_chart,_estimate_single_mode_counthelpersrc/view/characterization.py— symmetric±target_loss_pctlines inplot_calib_rate_sweepdata/config/method/calibration.json—rate_sweepblock trimmed;entry_serviceremoveddata/config/profile/dflt.json::specs— 13 artifacts updated to host-bound bin-packingdata/config/profile/opti.json::specs— same 13 + 3 swap-slot upgrades (MAS_{4}, AS_{4}, DS_{1})tests/methods/test_calibration.py—_aggregate_rate_trialstest updated, orchestration test rewritten to monkey-patch_run_rate_sweep_async00-calibration.ipynb— section 6b/7b reframed (closed-loop vs open-loop), label fixes, RUN_CALIB_SWEEP toggle defaults to False, "Route B" mentions removednotes/distribute.md::Section 12— UDS upgrade deferred planCLAUDE.md— Method Module Conventions: rate-sweep decoupling note, specs-binpacking recipe, zombie-cleanup three-layer pattern; View Conventions: auto-scaling legend, symmetric loss-band, K-fixnotes/titles_std.md— section 6b/7b retitled in audit table- Memory:
project_calibration_2026_04_28.md(NEW),project_uds_transport_deferred.md(existing), MEMORY.md indexed
- 24/24 calibration unit tests pass.
- Live smoke test: 1-combo sweep wrote
K_array=[32, 32, 32]andmeta.K_capacity=32(was 16384 before fix). - Live ping-only rate sweep: 3 rates × 2 trials × 1 s window finished in 12 s (vs ~30 min for the old TAS path).
- Plotter smoke tests: yoly chart with 16 entries (architecture) keeps
footer_h=0.18; with 48 entries (calibration sweep) renders cleanly withfooter_h=0.256.
- Re-run the multi-combo sweep with the current grid to refresh the on-disk
*_sweep.json(the 01:04 file predates the K-fix). Current sweep_grid:c=[8,16,32,64], K=[16,32,64,128], mu_factor=[0.5,1,1.5,2]= 64 combos. Recommendinter_trial_delay_s=3.0(was 0.3) to give TIME_WAIT room between combos. - Notebook re-run sequence (00→04) was paused at 00 due to TIME_WAIT exhaustion at high-c combos in the previous (cancelled) attempt; resume after a fresh terminal session has fully drained zombies.
Follow-up session refining the yoly suite (plot_yoly_chart, plot_yoly_space, plot_yoly_arts_hist, plot_yoly_arts_charts, plot_yoly_arts_behaviour) plus the calibration dim card. Driven by user feedback on rendered images.
Changed from \n to , across the five thin notebooks (00-04) so titles stay on one line: f"{Method}: {Subject}, {Scenario}". 32 title strings rewritten across notebooks plus 35 prose references in notes/titles_std.md, CLAUDE.md, the memory entry, and the notebook-editing skill.
Discovered the hard way that matplotlib's built-in mathtext does NOT recognise \boldsymbol; it crashes savefig with ParseFatalException: Unknown symbol: \boldsymbol. The smoke-tests passed because in-memory figure creation skips the tick-bbox path that triggers the parser; the failure only surfaced when the user ran the notebook end-to-end. Reverted 111 occurrences of \boldsymbol → \mathbf across src/view/common.py + 4 notebooks. Greek lowercase under \mathbf falls back to upright non-bold (matplotlib limitation requiring usetex=True to overcome); accepted the visual cost. Documented in feedback_matplotlib_mathtext_bold.md memory entry. New rule: ALWAYS smoke-test plotter changes with file_path= to disk, never just in-memory.
Two compounded bugs:
-
_split_on_K_decreasewas the helper checking for K-block boundaries to NaN-break the trajectory line. Butsweep_arch's natural Cartesian iteration order keeps K monotonically non-decreasing within each(c, mu)group (lambda is the inner loop, K outer factor), so the decrease-only check found zero break-points. Renamed to_split_on_K_changeand switched tonp.where(diff != 0)— any K change. Each K-constant sub-sweep now renders as its own dashed segment. -
K labels only annotated
(K.min(), K.max())— only 2 of 4 K bands got labels (e.g., K=8 and K=32 visible, K=10 and K=16 invisible). Switched all four painters tonp.unique(K)so every distinct K gets a label. Label position changed fromargmax(K == K_val)(first occurrence = origin cluster) tonp.where(K == K_val)[0][-1](last occurrence = high-θ trajectory tip).
derive_calib_coefs(envelope, K_values=[256, 512, 1024]) now tiles the per-n_con_usr observables once per K. Latency R(n) is K-independent (the host probe doesn't manipulate the buffer), so tiling is exact: only theta = L/K, sigma = λW/K, phi = M_act/M_buf shift across K. Notebook cell nb-calib-dim-card reads data/config/method/calibration.json::sweep_grid.K and threads it through. The yoly chart now paints 3 K-trajectories instead of a single point at uvicorn_backlog (16384). New meta.K_values field records the list; legacy meta.uvicorn_backlog retained.
Legend label corrections after the user pointed out int(1276.92) = 1276 truncates instead of rounding. Switched int(value) → round(value) in _format_path_legend, _paint_single_2d_yoly, _paint_single_3d_yoly. Now 1276.92 → 1277, 957.69 → 958, etc. Also wrapped μ in \overline{\mu} to indicate the architecture-level mean (since aggregate_sweep_to_arch collapses 13 per-node μ values via arithmetic mean).
User went through several flip-flops on whether titles should be Plane: θ vs σ or Occupancy vs. Stall, and whether axes should be Occupancy (θ) or just θ. Final agreed split:
- Panel titles (
_YOLY_PANELS) — bare symbols:r"$\mathbf{\theta}$ vs. $\mathbf{\sigma}$", etc. - Axis labels (
_DEFAULT_LABELS) — operational name with symbol in parens:r"Occupancy ($\mathbf{\theta}$)", etc. plot_yoly_arts_histx-axis exception — symbol-only override via local_hist_symbolsmap; the dense per-comp grid otherwise becomes unreadable.
plot_yoly_arts_hist reference line and labels (final state after several iterations):
- Reference line + legend at
np.median(_data), labelled$\widetilde{X}$(X-tilde = sample median, more robust than mean to K-block tail clustering). - Subplot title is two-line (
\n-separated):$\overline{X}=...$(sample mean vianp.mean) on top,$s^{2}=...$(sample variance vianp.var, NOT std-dev) below.pad=8, inner subgridspechspace=0.85so the two-line title clears the histogram body. - Number rendering via new local helper
_fmt_sci_mathtext(value, decimals=2)— producesr"mantissa.2f \times 10^{exp}"so the exponent is a proper superscript (not the rawe-02alphanumeric suffix:.Newould emit). 2 mantissa decimals (was 3 in an earlier iteration). Handlesvalue == 0, NaN, inf. - Why split metrics across reference vs title: median anchors the visible cluster (the line meeting the histogram bars); mean + variance live in the title because they describe distribution shape across the (n_con_usr × K × c × mu_factor) cloud and are easier to compare across cells than reading off the histogram axis.
Dropped the legacy sig=4 special case for σ in _apply_yoly_panel_axes and _apply_yoly_3d_axes. Originally needed because under Little's law σ_old = λW/L ≈ 1 and tiny variations collapsed at sig=2. After the σ formula correction (2026-04-25, λW/L → λW/K), σ values span a healthy range and read clearly at sig=2. Every yoly panel now uses uniform _apply_sci_format(ax, axes_list=["x", "y"]) with default sig=2.
Several attempts before landing the working solution. Final approach: when subtitle is set, title_h=0.10; pass title=None to build_stacked_figure; manually draw BOTH lines into the dedicated title_ax in axes coords (y=0.72 title, y=0.22 subtitle). Subtitle font bumped to 18 (was 14). Other approaches that failed:
_ax.set_title(subtitle, ...)lands at the top of the 3D body axis, clashes with suptitle.fig.text(0.5, 1 - title_h - 0.005, subtitle, ...)— figure-coord arithmetic; render-order between suptitle (axes-coord, drawn first) and fig-coord text caused inversions in some configurations.
Lesson: when the figure has a dedicated title-strip axis already, draw EVERYTHING into that axis with explicit axes-coord positions. Don't mix figure-coord and axes-coord text.
Tightened title strip + outer-hspace + body grid spacing across all five yoly plotters so titles don't bleed into the body and y-axis tick labels (with mathtext + scientific notation) don't overlap adjacent panels:
plot_yoly_chart:title_h=0.045,outer_hspace=0.025, bodywspace=0.32(60% wider than initial 0.20),hspace=0.22.plot_yoly_arts_hist:title_h=0.045,outer_hspace=0.025, outerhspace=0.30, innerhspace=0.65,wspace=0.40.plot_yoly_arts_charts:title_h=0.045, outerhspace=0.25,wspace=0.22, innerhspace=0.45,wspace=0.45.plot_yoly_arts_behaviour:title_h=0.045, outerhspace=0.10,wspace=0.08.
src/view/common.py—_DEFAULT_LABELS,_YOLY_PANELS,_format_path_legend,_paint_*_yoly(rename + label rounding + every-K labelling + tip placement +\overline{\mu}),_split_on_K_decrease → _split_on_K_change.src/view/charter.py— five plotter layouts tightened;plot_yoly_spacesubtitle stacking via dedicated title_ax;plot_yoly_arts_histmedian +s²+ sci-3-dec + symbol-only x-axis override;_apply_yoly_panel_axes+_apply_yoly_3d_axesuniform sig=2 sci format.src/methods/calibration.py—derive_calib_coefsacceptsK_values: Optional[List[int]]; tiles observables across K when provided; meta recordsK_valueslist.00-calibration.ipynb—nb-calib-dim-cardcell readssweep_grid.Kand passes toderive_calib_coefs.01-analytic.ipynb,02-stochastic.ipynb,03-dimensional.ipynb,04-yoly.ipynb— title separator\n → ,; bar/delta/heat/diff label LaTeX wraps; DataFrame display columns wrapped in mathtext.CLAUDE.md— View (Plotting) Conventions section extended with all yoly polish rules.notes/titles_std.md— final tables + status block updated..claude/skills/develop/notebook-editing.md— title template + DISPLAY map + matplotlib mathtext bold rule.- Memory:
feedback_matplotlib_mathtext_bold.md(new),project_titles_std_2026_04_27.md,project_yoly_k_change_split.md,project_yoly_polish_2026_04_27.md(new).
- All 5 thin notebooks
nbformat.validate()pass. - 6 yoly figures rendered to disk and visually inspected per iteration: trajectory tips show 4 K labels, legend shows
c=k, μ̄=mrounded half-up, panels share sci-2 format, histogram subplot titles readX̃=... s²=..., calibration dim card paints 3 K-trajectories. pytestbaseline unchanged (only label / config / layout changes; no logic touched).
Three refactor passes hit the five thin notebooks (00-calibration, 01-analytic, 02-stochastic, 03-dimensional, 04-yoly) plus src/view/common.py and src/view/charter.py.
Every plot title now reads f"{Method}: {Subject}\n{Scenario}". Method ∈ Calibration / Analytic / Stochastic / DASA / Yoly Chart. The four-key DISPLAY map is identical in every notebook:
DISPLAY = {"baseline": "No Adaptation", "s1": "S1: Retry", "s2": "S2: Select-Reliable", "aggregate": "S1 & S2"}Yoly subjects use "trade-off Projections" (2D panel grids) and "trade-off space" (3D clouds); "2D" / "3D" qualifiers dropped because the plotter family already encodes dimensionality.
Replaced 111 occurrences of \boldsymbol → \mathbf across src/view/common.py, 4 notebooks. matplotlib's built-in mathtext does NOT recognise \boldsymbol and crashes savefig with ParseFatalException: Unknown symbol: \boldsymbol. Lowercase Greek under \mathbf falls back to upright non-bold (a matplotlib limitation that needs usetex=True to overcome); accepted. Roman + uppercase Greek (Δ) DO bold under \mathbf.
Lesson learned: an in-memory smoke test that creates the figure but doesn't savefig skips the tick-bbox path that triggers the mathtext parser. ALWAYS save to disk when smoke-testing label / title / mathtext changes — file_path= is mandatory in the smoke recipe.
Renamed _split_on_K_decrease → _split_on_K_change in src/view/common.py. The previous helper inserted NaN only where K[i] < K[i-1], but sweep_arch's natural Cartesian iteration order keeps K monotonically non-decreasing within each (c, mu) group (lambda is the inner loop, K is the outer factor). The decrease-only check found zero break-points, so matplotlib drew dashed lines connecting the high-theta endpoint of one K-band back to the low-theta start of the next — visually misleading "return-to-origin" zig-zags. Switching to np.where(_diff != 0) (any K change) splits each K-constant sub-sweep into its own segment, fixing the visual.
plot_yoly_arts_hist now uses math symbols on subplot titles (plot_yoly_chart / plot_yoly_space / plot_yoly_arts_behaviour / plot_yoly_arts_charts all got title_h=0.025 (was 0.04) and outer_hspace=0.01 so the suptitle hugs the body. Legend labels now use mathtext: f"$\\mathbf{{c}}={int(c_val)},\,\\mathbf{{\\mu}}={int(mu_val)}$".
-
src/view/common.py—_DEFAULT_LABELS+_YOLY_PANELSuse\mathbf{}for coefficient symbols;_format_path_legend+_paint_*_yolylegend labels switched to mathtext;_split_on_K_decrease→_split_on_K_change. -
src/view/charter.py— five yoly plotters got tighter title_h + outer_hspace + body grid spacing. -
00-calibration.ipynb— DataFrame columns use[$\mathbf{\mu s}$]/[$\mathbf{ns}$]; markdown wrapslambda,mu,theta,sigma,eta,phi,M_act,M_buf,c_srv,W_qin$...$; plot titles includehost.get('hostname')on the second line. -
01-analytic.ipynb,02-stochastic.ipynb,03-dimensional.ipynb— DISPLAY map standardised;bar_labels/delta_labels/heat_labels/diff_labelsall use\mathbf{}mathtext; DataFrame summary columns now bold mathtext. -
04-yoly.ipynb— Yoly Chart titles + subjects rewritten to "trade-off Projections" / "trade-off space"; inherits axis labels from_DEFAULT_LABELS. -
notes/titles_std.md— final tables + DISPLAY map + naming rules + per-notebook plotter-by-plotter title spec. -
CLAUDE.md— Notebook + View conventions sections updated with title template, DISPLAY map,\mathbf-only rule, K-change NaN-break helper, smoke-test-with-file_path discipline.
nbformat.validate()on all five thin notebooks: all pass.pytest tests/io tests/analytic tests/dimensional tests/utils tests/methods(focused subset): unchanged from baseline (no src/ logic touched).plot_arch_deltasaved to disk with\Delta \overline{\mu}in a label: passes (the original failing call).plot_yoly_chart+plot_yoly_arts_chartssaved to disk with_DEFAULT_LABELS+_YOLY_PANELS+ mu legend formatter: passes.
Three structural changes landed late on 2026-04-26, all on top of the profile-rescaling work captured in the previous entry.
Both data/config/profile/{dflt,opti}.json now carry parallel top-level blocks:
artifacts— frozen theoretical model. Cámara 2023 canonical values. Consumed by analytic / stochastic / dimensional. Locked.specs— adjustable practical layer. Same node keys + variable structure. Consumed byexperiment.runandsrc/scripts/launch_services.py. Free to diverge fromartifactsonc,K,port,mem_per_bufferfor prototype-fidelity tuning without contaminating the model.
src/io/config.py::load_profile(adaptation, profile, scenario, source="artifacts") gains a source kwarg. Default "artifacts" keeps every analytic / stochastic / dimensional call bit-identical. experiment.py:495 and launch_services.py:325 switched to source="specs".
Initial state: deep-copy parity (artifacts → specs at migration time, 2026-04-26). Future divergence is operator-driven.
Tests: tests/io/test_config.py::TestSourceSwitch (4 cases, all green). Pre-existing test_lambda_z_only_at_entry and test_reads_setpoint_value loosened from hardcoded 345 to > 0 to absorb the lambda_z editing history.
Dissertation framing. "We separate modelled artifact specifications (the system DASA reasons about: mu, epsilon, c, K, lambda_z, routing) from practical deployment specifications (the runtime configuration the prototype actually uses: c_deployed, K_deployed, port, memory). The split lets the prototype be tuned for measurement fidelity (e.g., raising entry-router c to remove admission saturation) without contaminating the model's predictions. R1/R2/R3 verdicts apply to the modelled topology; experimental error is measured as the gap between the prototype's behaviour at the deployed configuration and the model's prediction at the modelled configuration."
LOG_COLUMNS bumped from 10 to 11 columns:
request_id, service_name, kind,
recv_ts, start_ts, local_end_ts, end_ts,
c_used_at_start,
success, status_code,
size_bytes
New mark_local_end() API in src/experiment/services/instruments.py (paired with a _local_end_var contextvar). mount_atomic_svc calls it right after admission release + eps + target pick, immediately before await dispatch(...). Terminals don't call it; @logger defaults local_end_ts = end_ts for them.
_build_svc_df_from_logs now produces two views per node:
- Local (default
rho/L/Wcolumns): fromlocal_end_ts - start_ts. M/M/c/K-comparable. Used by analytic / stochastic / dimensional cross-checks. - Total (parallel
rho_total/L_total/W_total): fromend_ts - start_ts. Client-perceived end-to-end. Used for Cámara R2 validation.
For atomic / terminal nodes the two views coincide. For composite routers (TAS_{*}) they differ by the dispatch-await time.
Why. Pre-bump, end_ts - start_ts for composite routers included the whole downstream subtree's processing time because the handler awaits the dispatched response inside its own bracket. That made TAS_{1}'s W = end-to-end response time across the entire architecture, producing the spurious "TAS_{1}.L blew up to 200" pattern. The Cámara-rate-rescaling memory entry (2026-04-23) attributing this to atomic saturation was wrong; atomic max rho stayed under 0.20 across all four adaptations even at lambda_z = 345. Fixed entry now in memory/project_camara_rate_rescaling_pending.md reflects the resolution.
213/213 experiment tests green post-change.
After cycling through 250 / 200 / 150 during the morning's diagnosis, lambda_z is restored to the published Cámara 2023 value of 345 req/s at TAS_{1} in both layers (artifacts + specs). The user authorised this as an explicit exception to the "artifacts is frozen" rule because the canonical published value is the right anchor for the model layer.
All downstream \lambda_{...} setpoints rescaled proportionally by 345/250 = 1.38 across both layers.
- New memory entry
project_artifacts_specs_split.md(full migration record). - New memory entry
project_local_end_ts_observable.md(schema bump + composite-router observable diagnosis). - Updated
project_camara_rate_rescaling_pending.md(RESOLVED). - Updated
project_qn_config_conventions.md(lambda_z=345 + two-layer schema note). - Updated
MEMORY.mdindex (3 new pointers, 2 description rewrites). - Updated CLAUDE.md "Data Convention" bullets: schema split, lambda_z=345, 11-column LOG_COLUMNS with local_end_ts, two-view operational metrics.
- User-side: re-run
01-04notebooks at the artifacts layer for sanity (no code change needed; defaultsource="artifacts"). - User-side: re-run
05-experimental.ipynbat the specs layer to populate the newlocal_end_tsand_totalcolumns; then iterate onspecsdivergence from artifacts to relieve TAS_{1} entry-router admission (likelyspecs.TAS_{1}.c = 16or higher to deliver 250 req/s without saturation). - Re-deriving the analytic JSON's
\W/\L/\Wq/\Lqsetpoints from the queue solver afterlambda_zrescaling (currentlyλW ≠ Lat the JSON seeds, sotest_sigma_close_to_theta_under_littleis failing; running 01-analytic regenerates them).
2026-04-26 — Profile rescaling for prototype throughput floor + composite-router observable diagnosis
Goal of the day. Make the experimental method produce results that align with the analytical / stochastic predictions (the dimensional + experimental adaptations had been showing "worse than baseline" deltas while analytical / stochastic showed improvements). Diagnosis traversed three layers — entry rate, server count, K buffer — before landing on a deeper issue: the entry composite's W observable is system-wide, not local.
Closed-form Jackson + M/M/c/K solve over c ∈ {1, 2, 3, 4, 6, 8} × K ∈ {10, 20, 40, 60, 100} per adaptation, holding mu fixed:
| c | baseline knee | s1 knee | s2 knee | aggregate knee |
|---|---|---|---|---|
| 1 | 472 req/s | 437 req/s | 460 req/s | ~437 req/s |
| 2 | 944 | 874 | 921 | ~874 |
| 3 | 1416 | 1311 | 1381 | ~1311 |
| 4 | 1888 | 1748 | 1841 | ~1748 |
K does not affect the saturation knee (rho = lambda / (c * mu) is the gate). K controls blocking probability and buffer depth at the knee, nothing else.
Sweep script kept at _sandbox/analyse_knee.py for reuse.
- c=2, K=40 uniform across all 13 / 16 artifacts. Knee at 874 req/s s1 worst case. Reverted next step.
- c=1, K=80 uniform. Knee unchanged at 437 req/s s1 worst, deeper buffer for prototype tail behaviour.
- TAS_{1} mu = 900 → 700. Aligns the entry composite with the other 700-req/s TAS components.
- lambda_z 345 → 150 req/s at the entry. All downstream
\lambda_{...}setpoints rescaled by factor 150/345 = 0.4348 (Jackson-linear, exact). Analytical bottleneck rho dropped from 0.69 (saturated tail) to 0.30 (clean steady state). Per-artifactlambda_z(mostly 0 for non-entry) and the_dataarrays under\lambdavariables also rescaled. - TAS_{*} K=10, atomics K=80. Asymmetric K reflects that routers have shallow queueing semantics, atomics absorb propagated bursts.
- TAS_{*} c=2, atomics c=1. Then TAS_{*} c=4, atomics c=1. Heterogeneous c — see "TAS_{1} composite-router observable" below for why.
lambda_z was bumped externally from 150 to 250 between steps 5 and 6 (probably via the analytical notebook's re-derivation pass); current state is 250.
Final config:
| Tier | c | K | mu (unchanged) |
|---|---|---|---|
| TAS_{1..6} (composite routers) | 4 | 10 | 700 (TAS_1, was 900) / 700 (others) |
| MAS / AS / DS (atomic domain) | 1 | 80 | unchanged |
Per-node analytical rho at lambda_z=250:
- baseline: bottleneck MAS_{3} rho=0.503; max TAS rho=0.089
- s1: MAS_{3} rho=0.543; max TAS rho=0.089
- s2: DS_{1} rho=0.516; max TAS rho=0.089
- aggregate: DS_{1} rho=0.500; max TAS rho=0.089
Atomic services are at 50-55 % utilisation (comfortable steady state); composite TAS services are at 9 % utilisation analytically — but the experimental observable diverges, see below.
Pre-edit experimental results for s2/aggregate showed TAS_{1} W = 1.86 s / 1.42 s while atomic services stayed at rho < 0.20. The Cámara-rate-rescaling concern (memory entry from 2026-04-23) was wrong — atomics are not saturated. The real issue:
TAS_{1}'s end_ts - recv_ts measures whole-architecture response time, not local queueing. The composite handler dispatches downstream and AWAITS the dispatched response inside its own start_ts → end_ts bracket. So:
- TAS_{1} W = end-to-end response time across TAS_1 → TAS_{2..4} → MAS_{} → AS_{} → DS_{*} → return.
- TAS_{2} W = whole subtree under medical kind.
- TAS_{6} W = local (terminal in current routing).
Little's law L = X * W applied to TAS_{1} gives system-wide in-flight, NOT local queue length. Comparing this to analytical L_{TAS_{1}} (which is local M/M/c/K queue at TAS_{1} only) is an apples-to-oranges error — both are correct, but they measure different observables.
Admission-saturation forecast at lambda=250 req/s, dispatch_wait=100ms (the observed s2 W):
| c at TAS_{1} | local rho_admission |
|---|---|
| 1 | 25.0 (saturated) |
| 2 | 12.5 (saturated) |
| 4 | 6.25 (saturated) |
| 8 | 3.12 (saturated) |
| 16 | 1.56 (saturated) |
| 32 | 0.78 (steady) |
c=4 reduces but does not eliminate the entry-router admission queue at lambda_z=250 if dispatch-await stays at ~100 ms. The proper fix is structural: stop measuring the dispatch-await as part of TAS_{1}'s service time. Either:
- (a) Add a
local_end_tscapture right before the dispatch httpx call; uselocal_end_ts - start_tsfor composite rho/L/W. Aligns the observable with the analytical M/M/c/K assumption. - (b) Stop comparing composite rows to analytical L/W in
07-comparison.ipynb; for TAS_{*} compute a different cross-method observable (system-wide in-flight = sum of L_local across the subtree).
(a) is cleaner; (b) is faster to ship. Pending decision until experiments are re-run with TAS c=4 to see if the W blowup is meaningfully relieved.
Three framings for the asymmetric c=4 (TAS) / c=1 (MAS/AS/DS) split, in order of increasing strength for paper review:
- Operational: "TAS_{1} is a multi-worker HTTP front-end (Tomcat / uvicorn / Gunicorn default), modelled as a thread pool with c=4. Cámara 2023's c=1 abstraction underestimates entry concurrency."
- Architectural: "Server count
creflects role: routing-only nodes (TAS_{*}) are stateless and trivially parallelisable (c=4); atomic domain nodes (MAS / AS / DS) represent single underlying resources (c=1). Adaptation operates over the domain layer, so the asymmetry is intrinsic to the case study." - Methodological: "We raise c at TAS_{*} so the entry router stops dominating measured response time, recovering the domain-layer adaptation differentials that motivate the case study."
Framing (2) is the strongest because it ties c to architectural role rather than instrumentation convenience and survives reviewer scrutiny. Note the OLD replication used uniform c=1 for byte-exactness; the new spec breaks that, traded for a meaningful 1000-req/s prototype.
The 2026-04-23 memory entry project_camara_rate_rescaling_pending.md claimed the seeded mu/lambda_z exceeded the prototype's ~200 req/s ceiling and were biasing 07-comparison. The pre-edit experimental data shows atomic rho < 0.20 across all four adaptations even at lambda_z=345, so atomic saturation was not the cause. The real cause was the composite-router observable mismatch (above). Memory entry to be updated.
- Re-run all four experiment notebooks (analytic / stochastic / dimensional / experiment) with the new (c, K, mu, lambda_z) profile. Compare per-node rho across methods.
- Decide between fix (a)
local_end_tsand fix (b) cross-method composite observable for07-comparison. - Update memory entry on Cámara-rate-rescaling.
σ = λW/L → σ = λW/K. User flagged the methodology-correct stall-coefficient formula. The old form was Little's-law identity (≈1 in steady state, structurally insensitive to K); the new form measures queueing share of capacity. Fix landed across:
data/config/method/dimensional.json::coefficients[1].expr_pattern({pi[0]}*{pi[3]}**(-1)).src/dimensional/networks.py::sweep_artifactandsweep_archinner-loop expressions.src/dimensional/reshape.py::aggregate_arch_coefs(denominatorsum(K)) andaggregate_sweep_to_arch.src/methods/calibration.py::_run_calib_pipeline(LaTeX\frac{λ·W}{K}).src/experiment/architecture.py::sweep_arch_exp(analytic body).src/view/qn_diagram.py::DIM_GLOSSARY_DEFAULT(legend)..claude/skills/develop/pydasa-usage.mdStall row (canonical-coefficients table).- Tests:
tests/dimensional/test_coefficients.py,tests/dimensional/test_sensitivity.py,tests/experiment/test_architecture.py. - Notebook captions:
00-calibration.ipynb,04-yoly.ipynb,06-yoly-experimental.ipynb.
Under Little's law (λW = L), σ_new ≡ θ on closed-form solves. On prototype runs the equality only holds approximately because operational λ counts every arrival but L = X·W uses successful-throughput X; tests/experiment/test_architecture.py::test_sigma_close_to_theta loosens to rtol=0.5 to absorb this.
Module rename. src/experiment/networks.py → src/experiment/architecture.py (homonym disambiguation vs src/dimensional/networks.py); tests/experiment/test_networks.py → tests/experiment/test_architecture.py. Public alias from src.experiment import sweep_arch_exp already in __init__.py, so external callers were untouched. Stale references swept from CLAUDE.md and src/methods/calibration.py docstring.
Audit campaign. Ran systematic 3-skill audits (code-documentation + coding-conventions + style-polish) on every src/dimensional + src/methods/{calibration,experiment} + src/io/tooling + src/experiment/architecture module and their test parity files. Recurring patterns:
- R16 stacked-
#runs collapsed to one-line whys (≈40 sites). src.view.dc_charts.<plotter>→src.view.<plotter>public-alias references (≈8 sites).- Bare
except Exception:narrowed to specific types:(OverflowError, ValueError, ZeroDivisionError)for M/M/c/K solver,(RuntimeError, OSError, ConnectionError)for uvicorn launches,(httpx.HTTPError, ConnectionError, OSError)for httpx readiness probes. The K-disappearance bug (solver overflow at K=16384 for c≥2) had been silently swallowed for weeks; narrowing surfaced it as a real solver ceiling. - Lazy stdlib imports (
ctypes,os,solve_jackson_lams) promoted to module top. - Test type-hint sweep: every
test_*method got-> None+ fixture-arg types. *test_name()*lead-in convention enforced on every test docstring; module-docstring class-bullet lists matched against actual class counts.Optional[X]notX | None,Dict[...]notdict[...]for project consistency.
New helpers in src/dimensional/reshape.py: _safe_div(num, den) and _per_combo_mean(sweep_data, art_keys, sym_template) to remove duplication.
Coverage gap closed. Added TestAggregateSweepToArch with 5 contracts using a synthetic 2-artifact sweep.
Jupyter-safe asyncio dispatch added to src/methods/experiment.py::_run_async_safe (worker-thread ProactorEventLoop/SelectorEventLoop when an ambient loop is detected; falls back to asyncio.run when none). Lets _RUN_RATE_SWEEP = True work in 00-calibration.ipynb without the RuntimeError: asyncio.run() cannot be called from a running event loop.
Calibration completion. Per-host JSON now carries dimensional_card (PyDASA-routed) + rate_sweep (calibrated_rate=200 req/s for DESKTOP-INKGBK6) + 128 kB payload threading from JSON config. src.io.load_dim_card accessor lazy-derives the card when not pre-baked.
Route-A predicted sweep removed (2026-04-25). derive_calib_sweep (closed-form M/M/c/K via src.dimensional.networks.sweep_artifact) was deleted along with TestCalibSweep (5 cases) and notebook section 6c. Calibration must be measurement, not theory; mixing loopback.median_us with M/M/c/K projection contradicted the calibration contract. The sweep_grid block in data/config/method/calibration.json is preserved because _build_ping_app reads sweep_grid.{c, K}[0] to seed the vernier service spec; the unused fields stay dormant until scale-2.md lands a CSV-driven sweep.
Test count after the campaign. 107+ tests across tests/dimensional/, tests/methods/test_calibration.py, tests/methods/test_experiment.py, tests/io/test_tooling.py, tests/experiment/test_architecture.py all green. The audit applied ≈80 individual fix items across ≈20 src + tests files.
Added src.methods.calibration.derive_calib_coefs(envelope, payload_size_bytes=0) producing theta / sigma / eta / phi from the measured handler_scaling + loopback blocks (Route B — measurement, not M/M/c/K prediction). Plumbing:
- μ = 1e6 / loopback.median_us (host bare-metal service rate).
- For each
n_con_usrlevel:R = median_us × 1e-6,X = n/R,L = n,Wq = (median_us − loopback.median_us) × 1e-6. - θ = L/K, σ = Wq·λ/L, η = X·K/(μ·c_srv), φ = (L·B)/(K·B) = L/K when payload is constant.
- ε excluded:
/pinghas no business logic that can fail. - Output dict uses LaTeX-subscripted keys ready for
src.view.dc_charts.plot_yoly_chart— no new plotter; the notebook renders the card with the same helper the dimensional method uses on TAS architectures. - Stored under
envelope["dimensional_card"]; notebook section 6b displays it.
Caveat. φ is NaN by default because every /ping request carries the same body, making memory utilisation identical to θ (degenerate-memory case). Becomes informative only after the payload-echo upgrade (128/256 kB body). Noted in the notebook markdown + CLAUDE.md.
Test count: 7 new TestCalibDimCard cases, all green. Helper reuses the existing dimensional vocabulary (same LaTeX subscripts, same plotter input shape) so the calibration fits into the DASA coefficient-space story without new view code.
What closed. P0.1-P0.4 (host harness + rate-sweep fold-in + pre-run gate + first baseline), scoped P1 (bounded deque(maxlen=500_000) + record_row + dropped_count + drain() + perf_counter_ns in the hot path), and the P2 stop-gate all landed on 2026-04-23 / 2026-04-24. Full detail in notes/calibration.md Checkpoint log.
P2 verdict. 5 trials of experiment.run(adp=baseline) against the post-P1 code: every trial completed cleanly (stopped=schedule_complete, log_drop_counts == {}), client_effective_rate mean 6.82 req/s (range 6.49-7.26, ~6 % spread), W_net mean 17.5 ms with a visible warm-in trend, wall-clock 173.7 s per trial. Interpretation: safety properties confirmed; the bounded-deque invariant holds, ns-precision is stable, nothing regressed. Performance lift is NOT decided — the default ramp tops out ~7 req/s, far below the ~180 req/s degradation point the calibration found. The handler-scaling data (8× latency degradation at c=10 on an empty /ping handler with ZERO logging) already strongly suggests event-loop queueing inside each service is the dominant bottleneck, not logger overhead. A saturation-regime A/B bench would cost many trials × many rates × many minutes of wall time; deferred until a use case demands it.
Module renames. Three files were called calibration.py. Kept the runner (src/methods/calibration.py) and renamed the other two for clarity:
src/io/calibration.py→src/io/tooling.pysrc/view/calibration.py→src/view/characterization.py
Public API (from src.io import ... / from src.view import ...) unchanged.
Reference baseline for this host (DESKTOP-INKGBK6). Clean re-bench on the post-refactor code, apps closed:
| Probe | Number |
|---|---|
| Timer min / median / std | 100 ns / 100 ns / 392 ns |
| Jitter mean / p99 / max | 663 μs / 1357 μs / 1985 μs |
| Loopback median / p99 | 1.29 ms / 2.21 ms |
| Handler c=1 → c=10000 | 1.5 ms → 30 s (log-log) |
Every experiment result on this host should report reported = measured_us − 1288.5 µs ± 1357.1 µs.
Next. P3.1 (extract endpoints to experiment.json) is the highest-leverage refactor. P4 is blocked on having a second LAN machine. A live rate-sweep would unblock the pending Camara rate-rescaling decision (project_camara_rate_rescaling_pending.md).
Plan filed. notes/calibration.md now holds the living memory + checkpoint doc for a multi-phase effort: (P0) per-host noise-floor harness, (P1) @logger append + periodic-drain refactor to kill mid-run disk I/O, (P2) local re-baseline, (P3) remote-ready packaging, (P4) 3-machine LAN deployment, (P5) comparison + case-study integration. Status column in that file is the single source of truth; this devlog gets only the transitions.
Filesystem split applied. Mirrored data/img/experiment/ in data/results/experiment/: both now carry calibration/, local/<adaptation>/, remote/<adaptation>/. Existing single-laptop results moved under local/; .gitkeep markers placed on every new empty directory per data/results/.gitignore convention (content ignored, structure tracked). src/io writers + src/view plotters still emit to pre-split paths; that wiring is phase P3.1, not landed.
Why now. The experiment method currently degrades measurably above ~180 req/s on the single laptop. The Camara-rate rescaling question (2026-04-23 entry below) only becomes answerable once the noise floor is characterized per host — otherwise we cannot tell whether "degradation" is measurement noise, logger back-pressure, or real service saturation.
Stop-gate. P3/P4 do NOT start until P2 has proven the logger refactor lifted the ceiling. If the refactor shows no lift, logger was not the bottleneck (per feedback_measure_before_assume.md) and the plan pivots toward the OS scheduler / HTTP stack / service saturation branches before sinking days into remote deployment.
Open question. The seeded values in data/config/profile/{dflt,opti}.json come from Weyns & Calinescu 2015 + Camara 2023 (Java/ReSeP stack): mu in [150, 1580] req/s and lambda_z = 345 req/s at TAS_{1}. The FastAPI prototype in src/experiment/ cannot sustain those rates: python -m src.methods.calibration --rate-sweep --rate-sweep-target-loss 1.0 reports the highest sustainable rate at <= 1 % effective-rate loss is ~200 req/s. Above that, the asyncio chain + httpx connection pool + executor wakeup dominate and the client undershoots the target by 7-30 %.
Why it matters. If 07-comparison.ipynb runs analytic at lambda_z=345 and experiment at lambda_z=345-but-actually-280, the headline analytic-vs-experiment delta is dominated by client undershoot, not by DASA tech-agnosticism. The DASA claim becomes untestable until the operating points line up.
Two options.
- Scale
lambda_zdown (preserve mu ratios). Pick lambda_z = 200 (or whatever--calibrate 1.0returns at the time). Updatedflt.jsonandopti.jsonsymmetrically. Analytic + experiment then meet at the prototype-sustainable rate. - Scale
muup (preserve lambda_z = 345). Bump everymusetpoint so the prototype headroom matches Camara's. Risk: largemuvalues push asyncio.sleep below the OS-timer floor at the per-service tick.
Option 1 is the cheaper move; option 2 is closer to the original paper's QoS targets. Defer the decision until we wire the two notebooks (05-experimental + 06-yoly-experimental) at the candidate operating points and observe the comparison quality.
Markers. TODO_revisit_rates keys added to both profile JSONs so a grep finds the same context from the data side. Resolve both at the same time (delete the keys when the decision lands).
Decision. Split the experiment method into two notebooks, mirroring the dimensional / yoly split locked on 2026-04-19:
05-experimental.ipynbkeeps the fixed-point per-adaptation execution (one(mu, c, K)per adaptation, lambda ramped to saturation, side-by-side analytic prediction + R1/R2/R3 verdict).06-yoly-experimental.ipynbadds a configuration-sweep yoly view measured on the FastAPI prototype, reusing the dc_charts plot vocabulary (yc_arch,sb_arch,ad_per_node,yab_per_node,yac_per_node, before/after overlay).
What changed.
src/experiment/networks.pynew module exposingsweep_arch_exp(cfg, sweep_grid, *, method_cfg, adp). Mirrorssrc.dimensional.networks.sweep_archshape; each combo overrides every node'smu / c / K, launches the mesh once, and derives one(theta, sigma, eta, phi)point per artifact. Reuses_run_async+_build_svc_df_from_logsfromsrc.methods.experimentvia local import to avoid a circular dependency.src/experiment/__init__.pyre-exportssweep_arch_exp.data/config/method/experiment.jsonadds asweep_gridblock (mu_factor=[0.5, 1.0, 2.0],c=[1, 2],K=[10, 32],util_threshold=0.95) — 12 combos. Deliberately small because each combo is a real mesh launch + ramp (~30 s).tests/experiment/test_networks.pycovers shape / dimensional bounds / stability gate via a 1-combo_QUICK_GRID+ tight ramp; 8 tests in 2.22 s.06-comparison.ipynbrenumbered to07-comparison.ipynb.CLAUDE.md+notes/workflow.mdtable updated to reflect the 7-notebook layout (5 methods, two of them split).
Why launch-per-combo, not in-process reconfig. The simpler path; keeps the sweep helper a thin orchestrator over the existing run pipeline. In-process knob mutation would require service-side support and is deferred until the small-grid path proves insufficient.
Validation. Test suite green. Notebook end-to-end run pending — to be confirmed once the small-grid sweep is exercised on a development laptop.
Incremental user-driven polish after the initial plot_dim_topology landing.
-
plot_qn_topology— node labels now showL = <val>(avg number in system, requests) instead ofrho = <val>(unitless, already in the colourbar). Colouring is unchanged (still rho-driven); only the label value changed. All four analytic adaptation topologies regenerated. -
plot_dim_topology— three refinements:-
$\eta = \frac{\chi \cdot K}{\mu \cdot c}$(explicit\cdotbetween multi-symbol factors so mathtext renders visible multiplications instead of kerning symbols together). - Scientific notation
.2eacross every numeric display (table cells, node labels, NETWORK overlay). Coefficients span orders of magnitude across scenarios (phigoes from ~1e-3 baseline to ~1e-1 heavy load); uniform.2eprevents fixed-point formats from hiding the variation. -
color_by="eta"default + data-driven min-max normalisation pinned into the memory so future callers do not cap at 1.
-
-
Regenerated:
data/img/analytic/{baseline,s1,s2,aggregate}/topology.{png,svg}via full01-analytic.ipynbre-execution;data/img/dimensional/{baseline,s1,s2,aggregate}/topology.{png,svg}via direct calls + re-executed03-dimensional.ipynb. -
CLAUDE.md + memory updated: the uniform-format rule ("if you mix
.2ewith.4facross sites of the same figure you create false visual comparability"), the label-shows-L convention on qn_topology, and the overlay$\bar{sym}$ (Name): valueformat are all pinned.
Closed the 15-rule src + tests audit (docstring wrapping, acronyms, verb-first, type hints, locals prefix, dataclass fields, first-def pedagogy, no inline ternaries, section banners, no em-dashes, boolean decomposition, imports at top, @property getters, British English, neutral increase/decrease). Every src module + tests mirror + demo + notebook markdown was walked; every stage logged in notes/audit.md. The 11 deferred B-batch public-API renames (B1 / B3 / B5 / B6 remainder / B7 / B8 / B9 / B10 / B11 + B4 / B12 internal) drained in one final sweep.
-
B-batch executed (30+ symbols):
NetworkConfig → NetCfg,load_method_config → load_method_cfg,Service* → Svc*(Spec / Request / Response / Context),ServiceRegistry → SvcRegistry,ExternalForwardFn → ExtFwdFn,mount_atomic_service → mount_atomic_svc,mount_composite_service → mount_composite_svc,ArtifactSpec._setpoint → .read_setpoint,._sub → .format_sub,per_artifact_lambdas → compute_lams_per_artifact,per_artifact_rhos → compute_rhos_per_artifact,lambda_z_for_rho → invert_rho_to_lam_z,solve_jackson_lambdas → solve_jackson_lams,lambda_zero (param) → lam_z,simulate_network → simulate_net,solve_network (stochastic) → solve_net,_time_weighted_mean → compute_time_weighted_mean,_model_string → format_model_string,aggregate_network → aggregate_net,check_requirements → check_reqs,sweep_architecture → sweep_arch,_find_max_stable_lambda_factor → _find_max_stable_lam_factor, networks_setpoint → read_setpoint,coefs_delta → compute_coefs_delta,network_delta → compute_net_delta,ClientConfig / RampConfig / CascadeConfig → *Cfg,_avg_request_size → _compute_avg_req_size,_specs_from_config → _build_specs_from_cfg,_routing_row → _read_routing_row,_router_kind_map → _build_router_kind_map,lambda_z_entry → get_lam_z_entry. Full before / after table in project_b_batch_renames memory. -
Held back: CSV column names on
SvcResp(service_name,message), JSON-backed fields onClientCfg(entry_service,request_size_bytes,request_sizes_by_kind), and PACS Variable-dict JSON keys (_setpoint,_mean,_data,_dims, ...). These are wire-schema / on-disk contract; renaming them would break historical replication dumps + force in-lockstep JSON-config edits. Python identifiers flip; disk schemas stay. -
R15 terminology swept in
notes/context.md+notes/objective.md: "improve reliability" → "raise reliability", "signals degrade" → "signals fall", "improves freshness" → "raises freshness", "degrades both" → "lowers both". Third-party citation titles (Arteaga Martin / Correal Torres paper) preserved verbatim. -
New plotter
plot_dim_topology: dimensional analog ofplot_qn_topology, mirrors the 3/4 graph + 1/4 table layout. Defaultcolor_by="eta"(min-max normalised because eta is unbounded), 2-line node labels (key + theta), architecture-average overlay$\bar{\theta}, \bar{\sigma}, \bar{\eta}, \bar{\phi}$in the top-right lightblue box, full coefficient table below the graph. Wired into03-dimensional.ipynbas section 4.data/img/dimensional/<adp>/topology.{png,svg}now regenerates for every adaptation, bringing dimensional into layout parity with analytic.plot_nd_heatmapdeliberately kept intact — still called on baseline, still emitsnd_heatmap.{png,svg}. -
Tests: 338 passing, ~6 min wall clock. Notebooks 01-05 re-executed end-to-end; 06-comparison carries a pre-existing
ImportError: _async_run(method 5 not yet built, unrelated to these renames). -
Policy pins extracted (now in CLAUDE.md): (i) wire-schema identifiers off-limits to Python renames; (ii) PACS Variable-dict JSON keys are contract and never touched by a sweep; (iii) scoped renames beat global regex when two modules intentionally share a name; (iv)
notes/audit.mdandnotes/devlog.mdskipped in whole-repo sweeps — they're historical record; (v) dict-subscript["NAME"]false-positives need manual review after every whole-word regex sweep.
Gap flagged, not closed: tests/view/test_qn_diagram.py does not exist; the plotter module is ~1300 lines and a pixel-level regression test is out of scope for this pass. Recorded as an audit gap in notes/audit.md Stage 0.10 close.
Why now. The user initiated the walk to bring the codebase to a consistent convention floor before the comparison method (method 5) lands on top. Drain the queue, pin the policies, move on.
Removed the duplicated handler step-order body that had grown in services/atomic.py and services/composite.py. The two handlers were functionally identical — service-time sleep, epsilon Bernoulli, routing pick, dispatch, wrap with @logger(ctx) — but with three composite-only wrinkles (kind-dispatch at entry, in-process sibling lookup, per-member routes). The duplication was bounded but about to cost us: notes/experiment.md §6.3 pins several observables (mu_measured, epsilon_measured, chi_measured, Little's-law check) that would have forced parallel edits in both files before method 5 could land.
src/experiment/services/atomic.py— added two keyword-only extension points:pick_target(ctx, req) -> target | None(default: Jackson-weighted pick overtargets) anddispatch(target, req) -> ServiceResponse(default:await external_forward(target, req)). Both defaults reproduce the pre-refactor atomic behaviour byte-for-byte.mount_atomic_servicenow also stashes the@logger-wrapped handler onctx.handlerso composite callers can reach it for sibling dispatch.src/experiment/services/base.py—ServiceContextgains one optional field:handler: Optional[Callable] = field(default=None, init=False, repr=False). Set bymount_atomic_serviceafter the handler is built; unused by atomic-only callers (third-party services).src/experiment/services/composite.py— rewritten to callmount_atomic_serviceonce per member, injecting a shared_handlersdict through a_dispatchclosure (in-process first, external-forward second) and an entry-only_pickclosure that readskind_to_target(raising HTTP 400 on unknown kind, matching the prior behaviour). The handler step-order now lives in ONE function.- Line count: atomic 97 -> 129, composite 160 -> 135. Net ~neutral; the win is structural, not size.
- Tests: 147 experiment tests pass unchanged (byte-equivalent behaviour). Both demos (
demo_tas.py,demo_third_party.py) still run clean.
Why not yesterday. Yesterday's style passes kept the two handlers sibling (deliberately — scope discipline, see feedback_skill_pass_scope_discipline.md). Today the question "can composite be rewritten on atomic?" made it worth the separate commit: the tradeoff flipped once the prototype audit listed multiple upcoming M/M/c/K observables that would land in the step-order code path.
Where the subtlety went. The trick that made the old code non-trivial — "shared _handlers dict populated after each member is mounted, consulted at request time via late-bound lookup" — is still in composite, but now it's one 4-line _dispatch closure instead of 40 lines of inline plumbing. That is the legitimate thing to understand when reading composite; everything else is library.
Second module covered by the 2026-04-22 skill-pass sweep (third_party was first; pattern captured earlier in the day).
src/experiment/instances/tas.py— tightened module docstring (added usage example; removed the imprecise "TAS_{2..4} Jackson-weighted / TAS_{5,6} terminal" phrasing that did not matchcomposite.py's real dispatch tree; stated kind-dispatch-vs-Jackson split up front). Function docstring now mentions the HTTP 400 on unknown kind, theapp.state.tas_componentsside-effect, and theentry_namekeyword-only default.tests/experiment/instances/test_tas.py— dropped the back-compat aliasbuild_tas as make_tas_service(exactly the kind of drift the verb-first-rename memory flags). Scrubbed stale jargon from the module docstring ("Option-B" is a registry-level vocabulary term; "M/M/c/K invariants per component" is wrong — the apparatus explicitly does not enforce those). Added*test_name()*lead-ins to every test method; tightened fixture docstrings; ASCII'd>= 1(was Unicode>=).src/scripts/demo_tas.py(new) — three-section walkthrough: kind-dispatch at TAS_{1}, in-process chain TAS_{1} -> TAS_{2} -> TAS_{3} with per-member logs, external-forward boundary at TAS_{2} -> MAS_{1}. Same idiom asdemo_third_party.py/demo_services.py. Verified by invocation.- Suite: 147 experiment-side tests pass in 11.7 s.
tests/methods/test_experiment.pydrift is still there and still out of scope (same orthogonalClientConfig.kind_weightsfailure as the earlierthird_partypass).
Scope discipline. Sibling files surfaced Option-B / ServiceState references in test_registry.py + test_seed.py but those cover different source modules (registry.py, base.py) — left alone per the scope-discipline rule (see feedback_skill_pass_scope_discipline.md).
Applied the code-documentation + coding-conventions + test-layout skills to src/experiment/instances/third_party.py and its associated tests.
src/experiment/instances/third_party.py— tightened module docstring (added usage example; fixed stale(spec, routing_row, forward)note that no longer matched the signature; stated terminal vs forwarding behaviour up front); function docstring now pairs thetargetsargument withexternal_forwardsemantics explicitly.tests/experiment/instances/test_third_party.py(new) — 5TestClass/ 6 tests covering app structure, terminal service, external-forward, Bernoulli (eps=1.0) failure, and log-row schema. Full**TestClass**+*test_name()*docstring convention;mu=1e9trick to keep per-test wall clock near-zero. All green in 7.7 s.tests/experiment/test_mem_budget.py— deletedTestBudgetEnforcement413class and the unusedmake_atomic_serviceshim. FR-2.4 runtime enforcement is deferred pernotes/prototype.md §7 item 3; the 413 tests were red-by-accident (one failing, one passing-for-the-wrong-reason). Rewrote module docstring with**TestClass**bullets matchingtest_tas.py/test_third_party.py; added*test_name()*lead-ins across the surviving tests.src/scripts/demo_third_party.py(new) — three-section walkthrough (terminal / forwarding / Bernoulli) matching the existingdemo_services.py/demo_registry.py/demo_client.py/demo_payload.pyidiom:_banner,sys.pathboot, numbered sections,async def _demo(), syncmain(). Verified by invocation.- Suite: 316 tests pass outside
tests/methods/test_experiment.py(the 1 fail + 10 errors there are pre-existing drift from the experiment scope reset,ClientConfig.kind_weights must sum to > 0; orthogonal to this pass).
Pattern captured: when the skill pass touches a module whose sibling tests are already green, keep the scope tight: polish docstrings, fix stale references, delete dead code, add one demo. Don't chase unrelated failures surfaced along the way; log them instead.
The existing prototype (4/5) runs and tests pass, but it was built as "a working FastAPI replica" instead of "apparatus for a hypothesis-driven experiment". The scientific-method framing — hypothesis → model → prototype → validation — was not explicit in the design, so operating points ([1, 2, 5, ..., 500] req/s), tolerances, and acceptance criteria are all ad hoc rather than derived from what would prove/disprove the tech-agnosticism claim.
- Drafted
notes/prototype-req.mdwith the experimental-design framing: hypothesis H1 (per-artifact|ρ_meas − ρ_pred| ≤ τ_ρacross adaptations), explicit reference model (analytic), FR-1..8 for the prototype apparatus, and a validation protocol that lives in a new notebook 06. Scope of the reset TBD — will be decided after the FR review. - Open-questions section (§7 in the FR doc) lists 7 items for user review: hypothesis phrasing, tolerances, grid points, adaptation scope, profile coverage, notebook split, skill creation.
User flagged 2026-04-20: some graphs in 04-yoly.ipynb are incorrect. Needs a pass after the prototype-req.md review is settled. Capture the specific mistakes and fix in a dedicated commit (don't bundle with the experiment reset).
- Strip all CS-2 (IoT-SDP) mentions from
notes/.cs_context.mdandcs_objective.mdwere imported with both case studies in-tree as working context for CS-1; once the full pipeline (analytic, stochastic, dimensional, experiment, comparison methods + notebooks + tests) is green, purge the CS-2 sections, tables, ADRs (ADR-CS2-*), references (lines 764-782 ofcs_context.md), and any cross-references. Post-implementation only — do not touch before the pipeline is reproducing__OLD__results.
2026-04-20 — Experiment method complete (4/5): FastAPI architectural replication + tech-agnostic validation
Delivered. Fourth of five evaluation methods in place. A FastAPI microservice replication of the TAS topology, deployed in-process via ASGI transport and routed by a shared httpx.AsyncClient. No dependency on ReSeP / ActivFORMS abstractions -- the point is to validate DASA's technology-agnosticism: if DASA's coefficients characterise the architecture rather than the implementation, they should transfer to a vanilla Python/FastAPI stack.
src/experiment/— 6 modules:services/base.py—ServiceSpec(immutable knobs from profile JSON),ServiceState(runtime: admission lock, c-slot semaphore, log buffer),ServiceRequest/ServiceResponsewire schema,log_requestdecorator enforcing M/M/c/K semantics (K admission + c capacity + Exp service time + Bernoulli failure + per-invocation CSV row).services/atomic.py—make_atomic_service(spec)for MAS / AS / DS.services/composite.py—make_composite_service(spec, pattern, downstream_targets)for TAS_{1..6}.patterns.py— four adaptation patterns:no_adapt(baseline),retry(s1),parallel_redundant(s2),retry_parallel_redundant(aggregate). Plain async Python, no framework.client.py—ClientSimulatorwith Poisson interarrival + λ-ramp (run_rampmirrors the yoly sweep pattern; cascade-fail early stop).launcher.py—ExperimentLauncherwires the 13-service mesh via a custom_MultiASGITransportthat routeshttpx.AsyncClientrequests per-port to the right FastAPI app. Context-manager API for setup / teardown.registry.py—ServiceRegistryresolves name -> URL fromdata/config/method/experiment.json.
src/methods/experiment.py— standard orchestrator contract (run(adp, prf, scn, wrt, method_cfg=None)) + CLI. Runs the ramp, aggregates per-service CSVs, emits the analytic-compatible per-node DataFrame + network aggregate + R1/R2/R3 verdict.data/config/method/experiment.json— deployment-only config (ports, ramp schedule, pre-measured request sizes). Does NOT duplicate DASA knobs (mu, epsilon, c, K, routing) — those still live indata/config/profile/<dflt|opti>.json.05-experiment.ipynb— thin notebook with validation plots: per-artifact measured ρ vs predicted ρ scatter (headline tech-agnosticism plot), per-step p50/p95 response time, R1/R2/R3 verdict table.- Tests — 32 new (17 service-layer + 10 pattern + 3 launcher + 9 orchestrator). Total suite 177 tests pass in ~3 min.
Key design decisions (see notes/experiment.md for rationale):
- FastAPI + uvicorn (via ASGI transport) + httpx + pytest-asyncio. Async is non-negotiable;
time.sleep()would block workers and destroy the M/M/c/K queue semantics.await asyncio.sleep(Exp(1/mu))matches the closed-form assumption. - Request size as HTTP header metadata, never
psutil. Client pre-samplessize_bytesfrom the method config's per-kind map and propagates through the chain. Zero runtime noise, fully deterministic under seed. Kadmission +cservice semaphore inside the app. Real queue semantics even without uvicorn's--limit-concurrency(which only fires on TCP binding, not in-process ASGI).state.admit()raises 503 whenin_system >= K;state.service_sem = Semaphore(c)gates concurrent processing. Verified: a burst of 5 concurrent requests at K=2 produces >=3 rejections; c=2 caps concurrent processing.- In-process ASGI mesh over real uvicorn servers (for v1).
_MultiASGITransportroutes httpx requests per-port to the right FastAPI app without binding ports. Fast + hermetic tests; no multiprocess orchestration complexity. Real uvicorn can be swapped in later if TCP-level realism matters. - λ ramp mirrors the yoly sweep.
ClientSimulator.run_ramp()goes fromlambda_start_frac * λ_maxtoλ_maxinlambda_stepsincrements; cascade-fail early stop when network-wide fail rate exceedscascade_fail_rate_threshold. Output maps to coefficient trajectories comparable to04-yoly.ipynb'ssweep_architecturecloud.
Deliberately NOT done (documented as "v2" in notes/experiment.md):
- Real uvicorn + TCP deployment (would measure real network overhead).
- Multi-kind workflow (v1 only sends
kind="analyse"; alarm / drug paths through TAS_{3,4} not exercised). - Multiprocess launcher (in-process is sufficient for DASA validation at the service-composition level).
Pipeline status. 4 of 5 methods complete. Next: comparison (method 5) — aggregates analytic / stochastic / dimensional / experiment into a cross-method R1/R2/R3 verdict and delta plots.
Session artifact cleanup. _rebuild_experiment.py (one-off notebook scaffolder) was deleted after use per the project's no-scaffolder-in-git convention.
Delivered. Third of five evaluation methods in place.
src/dimensional/— five thin adapters around PyDASA 0.7.1:schema.build_schema(),engine.build_engine(),coefficients.derive_coefficients()(config-driven via{pi[i]}placeholder spec),sensitivity.analyse_symbolic(),reshape.{coefficients_to_nodes, coefficients_to_network, coefficients_delta, network_delta}. Each module under 90 lines; PyDASA owns all the math.data/config/method/dimensional.json— FDUs (T,S,D), coefficient specs ({pi[i]}patterns for θ, σ, η, φ), sensitivity settings, and asweep_grid(6 μ-factors × 4 c × 4 K) earmarked foryoly.ipynb(Phase 3b/c).src/methods/dimensional.py— orchestrator withrun(adp, prf, scn, wrt, method_cfg=None)+ CLI; mirrors analytic/stochastic contract. Norequirements.json: dimensional characterises the design space, not operational thresholds.dimensional.ipynb(new) — 9-section thin notebook built viascripts/build_dimensional_notebook.py(reproducible regen). Runs all 4 adaptations and plots per-node heatmap / diffmap / network bars / delta for θ, σ, η, φ — all reusing existingsrc.view.qn_diagramplotters; no new view module needed for this notebook.- Tests — 34 engine-level (schema, engine, coefficients, sensitivity, reshape) + 22 orchestrator-level = 56 new; 138 total pass in ~6 min.
Key finding mid-Phase-3a: PyDASA reads _std_mean, not _mean. The PACS Variable-dict carries both _mean / _setpoint (scenario-display) and _std_mean / _std_setpoint (canonical-units, what pydasa consumes). Only _std_* flows into Coefficient.calculate_setpoint(). Any seed / override must update both halves.
Seeded dimensional from analytic results. The profile JSON's static L / W / Lq / Wq / λ / χ _mean values were inherited from the OLD CSV and did not reflect per-adaptation operating points — every artifact came out with θ=0.6 uniformly. Fixed via src/utils/seed_dim_from_analytic.py: runs analytic on a representative scenario per profile (baseline for dflt.json, aggregate for opti.json) and writes the solver's per-node λ, χ, L, L_q, W, W_q back into the variable _setpoint, _mean, _std_setpoint, _std_mean, _data fields. Also refreshes M_{act} (depends on L). Post-seed baseline θ varies 0.005 (AS_{3}) to 0.21 (MAS_{3}); σ ≈ 1.0 uniformly (Little's-law sanity check).
Limitation of the opti seed. Only 13 of 16 opti artifacts are seeded — the three pre-adaptation swap-out artifacts (MAS_{3}, AS_{3}, DS_{3}) do not appear in the aggregate scenario's artifact list, so their _mean values remain stale. If dimensional is later invoked on s1 / s2 (which use a subset of those pre-adaptation artifacts), the stale fields will flow through. Acceptable for now per "seed once" scope; can extend to merge across scenarios later if needed.
Notebook convention. dimensional.ipynb is generated from scripts/build_dimensional_notebook.py; edit the script, re-run, commit both. Keeps the notebook in git as a snapshot while the source of truth remains Python.
Why. Before starting the dimensional engine, the TAS profile configs needed to line up with the PACS reference framework {T, S, D} used by the two illustrative-example iterations (__OLD__/src/exports/dimensional_{1,2}_draft.py). Two gaps were blocking Phase 1:
- FDU symbol drift. TAS used
E(entity) for the request dimension; PACS (authoritative reference) usesS(structure). Same semantics, incompatible strings. PyDASA'sSchemawould reject every artifact. - Missing D-dimension.
\delta_{X}(data density, kB/req) was present in every artifact but flaggedrelevant: false, and the companion memory variablesM_{act, X}/M_{buf, X}were absent. Without them the Buckingham matrix has no D coverage and\phi(memory-usage coefficient) cannot be derived.
What. One-shot utility src/utils/migrate_dim_schema.py does three things per artifact:
- Rename token
E → Sin every_dimsexpression (117 indflt.json, 144 inopti.json). - Flip
\delta_{X}.relevant = true(13 indflt.json, 16 inopti.json). - Insert
M_{act, X}andM_{buf, X}with_dims="D",_units="kB",_cat="CTRL",relevant=true,_dist_type="data_product". Setpoints derived from existing setpoints:M_{act, X}._setpoint = L_{X}._setpoint × \delta_{X}._setpoint(active memory)M_{buf, X}._setpoint = K_{X}._setpoint × \delta_{X}._setpoint(allocated buffer)
For TAS_{1}: M_{act} = 6 × 1064 = 6384 kB, M_{buf} = 10 × 1064 = 10640 kB.
Provenance of the numbers.
K = 10 req— canonical perCLAUDE.md("every artifact has c=1 and K=10"); matches__OLD__/data/config/cs1/default_dim_variables.csv(mean=10, range=[5,15]); PACS iter1 used K_max=16 (same ballpark).\delta = 1064 kB/req— inherited verbatim from the OLD CSV's dimensional variable catalogue; anchored to medical-record / DICOM payload sizes (~1 MB typical). Not a direct citation from Weyns & Calinescu 2015 — the paper does not quantify payload size. This is an educated domain estimate applied uniformly across the 13 artifacts.M_buf = K · \deltaandM_act = L · \delta— derived, not guessed. The only dimensionally-consistent interpretation of "buffer capacity in memory units".
Outcome. 70 existing tests still green (pytest tests/ in ~12s). Schema is now compatible with PyDASA's Schema / Buckingham pipeline. Phase 1 of the dimensional method (engine + config-driven FDUs + coefficients) unblocked.
Delivered. Second of five evaluation methods in place; SimPy DES engine + NetworkConfig wrapper agrees with the closed-form analytic solution within Monte-Carlo noise across every adaptation.
src/stochastic/simulation.py— engine (QueueNode,simulate_network,job,job_generator) +solve_network(cfg, method_cfg)adapter in a single file (mirrorssrc/analytic/jackson.py). Seeds bothrandomandnumpy.randomat the start of each multi-rep call for reproducibility.src/methods/stochastic.py—run(adp, prf, scn, wrt, method_cfg=None)orchestrator + CLI. Themethod_cfgkwarg lets tests inject an abbreviated config without touching disk.src/view/qn_diagram.py— seventh plotter,plot_nd_ci(nds, *, metric, reference=None, reps=N, confidence=0.95, ...). Errorbar-on-points chart with optional analytic overlay as redxmarkers. Used in §6 ofstochastic.ipynb.stochastic.ipynb— nine sections, thin notebook; renders topology / heatmap / diffmap / CI (ρ + W) / net_bars / net_delta underdata/img/stochastic/<scenario>/(22 figure files, PNG + SVG each).- Tests — 19 new (9 engine, 10 orchestrator) using
_QUICK_CFG(3 reps × 1000 invocations / 100 warmup) for ~30x speedup. 70 total pass in ~9s.
Invocation → seconds bridge. Method config declares horizon_invocations / warmup_invocations (unitless counts); the SimPy engine runs in time. Conversion seconds = invocations / sum(lambda_z) lives in solve_network. Don't move it — keeps simulate_network unit-agnostic.
Cross-method sanity. Every analytic per-node ρ falls INSIDE the stochastic 95% CI band on the baseline figures (data/img/stochastic/baseline/nd_ci_rho.png). Aggregate W_net: analytic 3.09 ms, stochastic 3.10 ms. The two methods mutually validate.
Data/reference housekeeping. Merged data/reference/version.txt + data/reference/profile.md into a single summary.md; dropped the sources.
Dimensional method split into TWO notebooks (user decision 2026-04-19):
dimensional.ipynb— pre/post adaptation solution, but plotting coefficients (θ, σ, η, φ) not queue metrics, reusing the existing heatmap / diffmap / bars / delta plotters with coefficient columns.yoly.ipynb— configuration-sweep diagram (plot_yoly_*family ported from__OLD__/src/notebooks/src/display.py), shows how TAS behaves across a sweep of configurations. New sibling view modulesrc/view/yoly_diagram.pyto keep queue-network and yoly visuals separate.- Plan captured in memory (
project_dimensional_plan.md) for the next session to pick up.
Next: start src/dimensional/ engine + two notebooks.
Delivered. Silent config drift found and fixed; baseline Jackson solution now matches __OLD__/data/results/cs1/data/dflt_analytical_{node,net}_metrics.csv to the 6th decimal place on every per-node row and every network-wide aggregate.
c=1,K=10canonical values restored across every artifact in bothdata/config/profile/dflt.jsonandopti.json.dflt.jsonhad silently drifted toc=2(halving every utilisation);opti.jsonalso hadK=6(tightened during some earlier test). One-shot repair utility atsrc/utils/fix_c_k.py— ran once, left in place as a frozen record.- Artifact + variable keys migrated to LaTeX form. Artifact JSON keys:
TAS_1->TAS_{1},MAS_3->MAS_{3}, etc. Variable keys with q-subscripts split correctly:Lq_{TAS_{1}}->L_{q, TAS_{1}},Wq_{TAS_{1}}->W_{q, TAS_{1}}. One-shot migration utility atsrc/utils/rename_keys.py.ArtifactSpec._sub()collapsed to identity (key IS the LaTeX subscript now). - Baseline headline numbers (exact match with OLD CSV):
avg_mu=653.85,avg_rho=0.29728,L_net=6.98730,Lq_net=3.12884,W_net=3.437 ms,Wq_net=1.541 ms,TP_net=2038.50. Per-node rows also match (MAS_3: rho=0.694, L=2.068, W_q=0.01336).
src/view/qn_diagram.py grew to six plotters with a uniform signature contract (keyword-only after required positionals, return Figure, save both PNG+SVG via _save_figure): plot_qn_topology, plot_qn_topology_grid, plot_nd_heatmap, plot_nd_diffmap, plot_net_bars, plot_net_delta. Ported _generate_color_map from __OLD__/src/notebooks/src/display.py for the multi-scenario palette. Fixed the SVG-dark-theme text-invisibility gotcha: _TEXT_BLACK = "#010101" (not pure "black") forces matplotlib to emit an explicit fill attribute that dark-theme viewers cannot override.
Notebook (analytic.ipynb, 17 cells under the 30-cell budget) produces one standalone topology per adaptation + per-node heatmap + per-node diffmap + network-wide bars + network-wide delta bars — 20 figures total under data/img/analytic/<scenario>/ (PNG + SVG for each of 10 figure types). Outputs cleared before commit.
Tests: 51 green (11 queues, 4 jackson, 12 metrics, 11 io/config, 13 methods/analytic).
Pitfalls captured in memory (so they do not return): c=1, K=10 canonical values; LaTeX key format; uniform arc3,rad=0.2 for self-loops (custom rad=1.0 overlaps cross-edges); #010101 text colour. See CLAUDE.md §View (Plotting) Conventions and Claude memory project entries.
Next method in the pipeline: src/stochastic/ (SimPy DES). Config already at data/config/method/stochastic.json.
Delivered. First end-to-end evaluation method is green across the full 4-adaptation axis; analytic.ipynb reproduces the metrics table and 11 figures from a cold clone.
src/analytic/—queues.py(registry-dispatchQueue()factory +BasicQueueABC +QueueMM1/QueueMMs/QueueMM1K/QueueMMsKconcrete classes;_QUEUE_MODELSdict at module bottom makes adding new models one entry),jackson.py(solve_jackson_lambdas()linear core +solve_network()wrapper),metrics.py(aggregate_network()+check_requirements()with JSON-backed thresholds).src/view/qn_diagram.py— 5 plotters (plot_qn_topology,plot_qn_topology_grid,plot_nd_heatmap,plot_net_bars,plot_net_delta) with a uniform param-IO convention (keyword-only args after required positionals; every plotter returnsFigureand persists whenfile_path+fnamegiven). Shared_save_figure(),_resolve_metrics(),_resolve_labels()helpers.src/methods/analytic.py—run(adp, prf, scn, wrt)orchestrator + CLI. The written envelope carries the fullrouting(13x13) andlambda_z(13) fields alongside metrics so downstream consumers can reconstruct paths without re-opening configs.analytic.ipynbat repo root — thin notebook (20 cells, under the 30-cell budget). Callsrun()across the 4 adaptations, prints the summary + verdict tables, saves 11 figures underdata/img/analytic/<adaptation>/. Clears outputs before commit.
Thresholds externalised. data/reference/baseline.json now holds the Camara 2023 R1 / R2 / R3 values (0.0003, 0.026 s, null); metrics.py reads them via src.io.load_reference("baseline"). No more hardcoded _R1_MAX_FAIL_RATE / _R2_MAX_RESP_TIME in Python.
Headline numbers at 345 req/s (all four adaptations PASS R1 / R2 / R3):
| adaptation | W_net (ms) | avg_rho | max_rho | bottleneck |
|---|---|---|---|---|
| baseline | 1.99 | 0.149 | 0.347 | MAS_3 |
| s1 | 2.01 | 0.164 | 0.375 | MAS_3 |
| s2 | 2.08 | 0.168 | 0.356 | DS_1 |
| aggregate | 1.95 | 0.161 | 0.345 | DS_1 |
Aggregate is the best configuration on both W_net and max_rho; s1 alone is the worst on max_rho because opti routing pushes more load into the dflt services at the three swap slots (MAS, AS, DS). Bottleneck shifts from MAS (dflt services) to DS (opti services) as soon as s2 / aggregate activate.
Tests. 51 pytest cases green: 11 queues, 4 jackson, 12 metrics (includes 3 pinning thresholds to the JSON), 11 io/config, 13 methods/analytic. Notebook runs cold without manual intervention.
Housekeeping.
data/results/tracked as a directory (1.gitkeep+ local.gitignore); generated JSONs remain ignored.src/utils/import_old.pyremoved — migration script served its purpose;dflt.json/opti.jsonare the sources of truth.conftest.pykept with a TODO pointing at the eventualpyproject.tomlreplacement.
Pending. 4 methods still unbuilt (stochastic, dimensional, experiment, comparison); assets/ documentation staging directory still empty.
Delivered.
opti.jsonartifacts expanded from 13 to 16. The three swap slots (nodes 6, 9, 11) now carry BOTH variants:MAS_3(dflt) alongsideMAS_4(opti),AS_3/AS_4,DS_3/DS_1. The opti CSV'snamecolumn (MAS 3->4,AS 3->4,DS 3->1) motivated distinct artifact keys instead of silently overwriting values in-place._nodesis now a dict per scenario, each value a 13-element list naming the active artifact at each positional slot:_nodes["s1"]uses dflt services at the swap slots (MAS_3,AS_3,DS_3)_nodes["s2"]and_nodes["aggregate"]use opti services (MAS_4,AS_4,DS_1)
_routsand_labelsalso keyed by scenario name (matching_nodes).dflt.jsonuses the same dict shape for operational consistency — single key"baseline"._vars_sourceremoved. It was a workaround for the previous fixed_nodeslist + external composition; now that_nodes[scenario]names the right artifacts directly, composition is explicit.- Labels rewritten without em dashes; each label names the strategy (Retry / Select Reliable), the service swaps, and what stays dflt vs opti.
Generator refactor. src/utils/import_old.py now has two node-to-artifact maps (_DFLT_NODE_MAP, _OPTI_NODE_MAP) and passes the map into load_topology / load_variables / _rename_depends. Re-run: python -m src.utils.import_old.
Delivered.
data/config/profile/opti.jsongenerated bysrc/utils/import_old.pyfrom__OLD__/data/config/cs1/optimal_{qn_model,dim_variables}.csv. PACS-style envelope, 13 artifacts, 143 opti variables.environments._scenarios = ["s1", "s2", "aggregate"]with_vars_source = ["dflt", "opti", "opti"]and_routs = [opti, dflt, opti]— so each scenario composes (routing × variables) from the right source.data/reference/— authors' TAS 1.6 replication dump (Cost-QoS,Preferred-QoS,Reliability-QoS×no-adapt,simple-adapt— six leaf folders, each withinvocations.csv,log.csv,results.csv+ 8 PNG charts). Column schema indata/reference/profile.md. Treated as the authoritative reproduction target for theexperimentmethod's acceptance criterion.data/config/adaptation/removed. The two stub files (s1.json,s2.jsonwithMAX_TIMEOUTS/timeout_length_ms/parallel_count/rt_threshold_msplaceholders) are redundant now thatopti.jsonenumerates all three after-adaptation scenarios self-sufficiently.- Docs synced —
workflow.md§1/§2 adaptation-axis table and directory layout,CLAUDE.mddata convention,README.mdaxis table + folder tree,quickstart.mdadaptation table.
Loader contract (unchanged CLI). --adaptation <baseline|s1|s2|aggregate> still works, but the loader's composition rule tightens:
baseline→dflt.json(only scenario)s1→opti.json._scenarios[0]; vars from dflt, routing from optis2→opti.json._scenarios[1]; vars from opti, routing from dfltaggregate→opti.json._scenarios[2]; vars from opti, routing from opti
SUMMARY.md gained a References section (CS-1 refs [1], [2], [3], [9] Rico, [10], [13]) matching the works actually cited, with a pointer to cs_context.md § References for the full list.
Delivered.
- Config tree scaffolded under
data/config/:profile/dflt.json— 13-node topology (M/M/s/K) + 143 PyDASA variables, produced bysrc/utils/import_old.pyfrom__OLD__/data/config/cs1/default_qn_model.csv+default_dim_variables.csv.adaptation/s1.json,s2.json— stub override files for Retry-style (S1) and Select-Reliable-style (S2) with placeholder params (MAX_TIMEOUTS,timeout_length_ms,parallel_count,rt_threshold_ms).method/stochastic.json— SimPy params (seed=42, 10k invocations, 10 replications, 95 % CIs; mirrors [13] § V-B).method/experiment.json— architectural-experiment params (500 invocations × 6 replications; reproduces [1] Table IV).
- README + SUMMARY rewritten — now scoped to CS-01 TAS only (prior README mixed CS-01 and CS-02). README links to the six
notes/*.md+CLAUDE.md; SUMMARY carries the Table IV headline numbers and the R1/R2/R3 targets.
src/utils/import_old.py kept as a committed tool so the conversion is reproducible (not a throwaway). Re-run with python -m src.utils.import_old whenever the old CSVs change.
Repo hygiene decision — results never committed.
Per user: the bulk of result files should not be checked in. Anyone reproducing runs the pipeline locally. Added to .gitignore:
data/results/— all method runs produce JSONs here; ignored en masse.lab/— future scratchpad PoCs.build/,.reports/,*.ipynb_checkpoints/.
Still tracked: data/config/ (all configs, including the 143-variable dflt.json at 114 KB), assets/img/ (figures cited in reports), notes/, src/, tests/.
Next steps.
- Scaffold remaining
src/subpackages with empty__init__.py:analytic,stochastic,dimensional,experiment,view,io,methods - Implement
src/io/config.pyprofile ⊕ adaptation merge helper (Move 2) - Implement
src/methods/analytic.py+src/analytic/M/M/c/K solver as first end-to-end method (Move 3) - Pytest skeleton mirroring
src/ - Thin notebook stubs at repo root
Spotted asymmetry between inputs (named by identifier: profile/dflt.json, adaptation/s1.json) and outputs (named by content type: variables.json). Fixed by naming the per-run output file after the profile identifier, matching the PACS precedent (PACS-vars-iter1.json).
Per-run output is now a single JSON named after the profile, following the PACS pattern:
data/results/<method>/<adaptation>/<profile>.json
The file carries a PyDASA-compatible object with content keyed inside:
variables— PyDASA Variable dict (every method)coefficients— derived DCs (dimensional only)pi_groups— raw π-groups (dimensional only)deltas— per-variable differences (comparison only)
Split out: requirements.json. R1/R2/R3 verdicts are profile-agnostic and consulted independently of raw variables; they keep a content-type name.
Adding a second profile is additive. camara.json drops next to dflt.json in the same (method, adaptation) folder; no migration.
Refinements that closed the design.
- Collapsed scenario and strategy into one adaptation axis. In this case study S1 and S2 are two names for the same "after adaptation" concept seen through different scenario lenses: S1 applies switch-to-equivalent (Retry mechanics), S2 applies preferred-service ranking (Select Reliable mechanics). They are not independent axes. Values:
baseline,s1,s2,aggregate. aggregateis a real run, not a display rollup. It applies both S1 and S2 overrides together — the realistic deployed configuration a production system would actually use.baselineis a run tag, not a config file. The profile is the baseline; noadaptation/baseline.json. Adaptation configs only exist for S1 and S2;aggregatemerges both.- Result and config files are JSON (PACS format), not CSV. Every file uses the PyDASA
Variable-dict schema keyed by LaTeX symbol with_sym,_dims,_units,_min,_max,_setpoint,_data, … — same as__OLD__/src/notebooks/data/PACS-vars-iter1.json. Inputs and outputs share the schema, no CSV↔JSON conversion. - Leaf files:
variables.jsonandrequirements.jsonfor every method; pluscoefficients.json/pi_groups.jsonfor dimensional; plusdeltas.jsonfor comparison. - Single CLI shape:
python -m src.methods.<method> --adaptation <baseline|s1|s2|aggregate> [--profile dflt]. Thesrc.iolayer handles the profile ⊕ adaptation merge.
Matrix. 5 methods × 4 adaptations = 20 runs. Each of analytic / stochastic / dimensional / experiment runs 4 adaptations; comparison reads all four methods per adaptation and writes 4 comparison reports.
Dropped from earlier drafts.
- Separate
scenarioandadaptationaxes (merged into one). - CSV leaf files (→ JSON/PyDASA schema).
- Per-strategy adaptation values (
retry,select_reliable) — these are the mechanics of S1/S2, not separate options. Documented in method contracts as "S1 = Retry-style, S2 = Select-Reliable-style". - The "utility" axis (
cost_qos/reliability_qos/preferred_qos). R1/R2/R3 are fixed thresholds from Cámara 2023, reported inrequirements.jsonper run. - Four-token flat filename pattern — axes live in the path, leaves are just
<content>.json(one more token-free).
Next steps.
- Scaffold
src/methods/modules withrun(adaptation, profile='dflt')signature and CLI stub - Scaffold
src/subpackages (analytic,stochastic,dimensional,experiment,view,io,utils) - Scaffold
data/config/{profile,adaptation,method}/with stub JSONs in PyDASA Variable-dict format (portprofile/dflt.jsonfrom Table III of [1]) - Scaffold
src.ioprofile ⊕ adaptation merge helper - Create 5 thin notebook stubs at root
-
tests/mirrorssrc/subpackages
Decision. Every file, folder, and CLI argument uses the same four axes in the same order:
<method>∈{analytic, stochastic, dimensional, experiment, comparison}— reframed from "stage" to "method" because the code implements DASA's evaluation methods, not sequential stages.<scenario>∈{s1, s2}— service failure and response-time variability (the two focus scenarios percs_objective.md; S3–S5 out of scope).<adaptation>∈{baseline, retry, select_reliable}—baseline= No Adaptation, the before-adaptation reference.retryandselect_reliableare the after-adaptation strategies from Table IV of [1].<profile>∈{dflt, ...}— service catalogue variant; CLI flag, defaults todflt.
Naming convention. Paths carry the axes, leaves carry only <scope>.<artifact>.<ext>:
- configs:
data/config/<axis-folder>/<value>.json - results:
data/results/<method>/<scenario>/<adaptation>/<scope>.<artifact>.<ext> - figures:
assets/img/<method>/<scenario>/<adaptation>/<figure>.png - CLI:
python -m src.methods.<method> --scenario <s> --adaptation <a> [--profile <p>]
Dropped. The "utility" axis (cost_qos/reliability_qos/preferred_qos from __OLD__/data/baseline/cs1/). Those variants corresponded to different weight sets inside R3's utility function. In the new framing, R1/R2/R3 are validation criteria evaluated in requirements.csv per run, not a run axis. Keeps the matrix flat at 30 runs instead of 90.
Why the rename. Single-repo, single case study — no CS-01- prefix needed. "Method" matches the case-study narrative (cs_objective.md frames each as an evaluation method). The four-axis pattern is strictly repetitive: the same four words appear in the same order in every path, filename, and CLI — auditor-friendly, greppable, scriptable.
Run matrix. 5 methods × 2 scenarios × 3 adaptations = 30 runs. The comparison method collapses across the other four per (scenario, adaptation), producing 6 comparison reports.
Validation criteria. Every run emits requirements.csv with one row per R1/R2/R3 target from Cámara 2023:
- R1: failure rate ≤ 0.03 % (Availability)
- R2: response time ≤ 26 ms (Performance)
- R3: minimise cost subject to R1 ∧ R2
Next steps.
- Scaffold
src/methods/modules withrun(scenario, adaptation, profile)signature and CLI stub - Scaffold
src/subpackages (analytic,stochastic,dimensional,experiment,view,io,utils) - Scaffold
data/config/{profile,scenario,adaptation,method}/with stub JSONs (profile/dflt from Table III of [1]) - Create 5 thin notebook stubs at root
-
tests/mirrorssrc/subpackages
Decision. Pipeline is five stages: S1 Analytic, S2 Stochastic, S3 Dimensional, S4 Comparison, S5 Architectural Experiment. No -CS-01- prefix in filenames (single-case repo). No calibration notebook.
Pattern. Hybrid — each stage is a Python module src/stages/sN.py exposing run(config_path) -> dict and a main() CLI; a thin notebook SN.ipynb at repo root calls run() for narrative and inline display. CLI and notebook produce byte-identical artifacts. Logic lives in src/, never in notebooks.
Why. Optimises for "follow or any external auditor or public exposure":
- CLI makes the pipeline scriptable and CI-friendly; notebooks make it reviewable.
- Unit tests can target
src/modules directly instead of parsing.ipynbJSON. - Clean git diffs because notebooks stay small.
- Slightly more upfront effort than pure notebooks, but pays back the moment tests or automation are needed.
What was dropped and why.
- Calibration (former
CS-01X): if the analytic model disagrees with the stochastic ground truth, that is a finding worth reporting, not a parameter to tune away. Config optimization (theopti_*prefix from the old artifacts) is a side effect of S4 if a second pass is wanted. CS-01-prefix: this repo holds exactly one case study; the prefix was pure ceremony.data/baseline/anddata/analysis/subfolders: collapsed intodata/config/(inputs) anddata/results/(outputs). Simpler I/O contract.
Alternatives considered. Pure notebooks (cheaper start, worse diffs and tests), pure CLI (no narrative for publication), Jupytext paired files (adds a pre-commit hook dependency), Quarto (overkill for early iteration). Hybrid won on long-run maintainability.
Next steps.
- Scaffold
src/subpackages (analytic/,stochastic/,dimensional/,experiment/,view/,io/,utils/) with empty__init__.py - Scaffold
src/stages/s{1..5}.pywithrun()signature and CLI stub - Scaffold
tests/mirroringsrc/ - Create 5 thin notebook stubs (
S1.ipynb..S5.ipynb) - Port the service catalogue from
__OLD__/data/config/cs1/default_qn_model.csvtodata/config/dflt.json - Update
README.md+SUMMARY.mdto match the new shape
Decision. Archive the prior implementation under __OLD__/ and rebuild the case study on top of the current PyDASA release. The old version mixed closed-form and stochastic results without a clean modelling layer boundary, which made it hard to reproduce the dimensional analysis step.
What moved to __OLD__/:
- 6 notebooks:
CS-01A(Analytical),CS-01B(Stochastic),CS-01C(Dimensional),CS-01D(Dimensional Simulations),CS-01E(Data Analysis),CS-01X(Analytical Calibration) src/{model,simulation,utils,view}/data/{analysis,baseline,config,results/cs1/{data,img}}/- Prior notes and commands reference
What stays:
LICENSE,.gitignore, high-levelREADME.md(to be rewritten and scoped to CS-01 only)requirements.txt(pinned against PyDASA 0.3.2 wheel).claude/skills scaffold (needs pruning — some leftover out-of-scope skills)
Next steps.
- Confirm notebook list and ordering (keep all 6, or collapse
Einto per-model notebooks?) - Decide whether to port any code from
__OLD__/src/or start clean againstpydasapackage - Prune
.claude/skills/of out-of-scope skills; portcommands/from../PyDASA/.claude - Rewrite
README.md+SUMMARY.mdscoped to CS-01 TAS - Scaffold empty
src/,data/,assets/,tests/and notebook stubs -
Decide: keep→ Keep tracked during migration; remove once the new notebooks +__OLD__/tracked in git, or.gitignoreit?src/reproduce its results.
Closed the model-only chapter at notes/report-A.md — 14 sections, 5 figures embedded, all data-consistency-checked. Headline findings:
-
R1 unbinds under the 2026-05-13 locked routing: all four adaptations (baseline / S1 / S2 / aggregate) PASS R1 at
$0.00$ –$0.39$ % vs the$1$ % threshold. Earlier drafts had baseline + S2 failing at$1.25$ %; the routing-encoding pass on 2026-05-13 (self-loop = ε convention, feedback retry edges, inverse-ε dispatch, [13] Table I alignment, three ε corrections at AS_3 / MAS_4 / AS_4) pushed all four into PASS. - Four-DC trade-off (Table 6.9) is the chapter's principal RQ-02 evidence — θ, σ, φ drop by 3.6–5.7 % under adaptations; η rises by 0.9–2.4 %. φ (memory utilisation) is only visible in the dimensional view.
- Aggregate is non-additive vs S1 + S2 (mechanism overlap on θ, compound-loading on η). Σθ residual = +0.0324 above additive prediction; Ση = 38.48 super-additively above both S1 (37.90) and S2 (38.23).
- Reference-frame caveat (§6.4.1): bounds σ_R2 = 0.897, η_R1 = 23.23 are baseline-derived and held fixed across adaptations for cross-method legitimacy. Under S2's own design space, η_R1 would tighten to ≈ 13.94 — flagged as a deliberate methodological commitment.
Three documentation corrections same session:
- "Cámara 2023 six-decimal replication anchor" retracted; chapter's credibility hook is now input-traceability against [13] Table I.
- The "$\theta \le 0.65$,
$\sigma \le 1 - \theta$ ,$\eta \le 1$ ,$\phi \le 1$ " bounds list was partially fabricated; replaced with the canonical PACS bounds ($\theta < 0.3$ ,$\sigma < 0.3$ ,$\rho \le 0.65$ ,$\phi \le 1$ ) and a CLAUDE.md "never claim these" entry. - AS_3 ε corrected from 0.10 to 0.18; MAS_4 / AS_4 epsilon values un-swapped (0.08 ↔ 0.10) per [13] Table I row 4.
Three first-class items carried into next session ([[next-session-todos-2026-05-14]]):
- Fix
\wvscnaming collision indata/config/profile/*::specs. - Implement remote deployment (stage 8.1, starter daemon for
dpl="remote"). - Parent-PID watchdog for uvicorn workers (calibration leftover-process bug hit twice today).
Memory entries written: project_report_a_drafted_2026_05_14.md, project_locked_routing_2026_05_13.md, project_next_session_todos_2026_05_14.md. MEMORY.md index + CLAUDE.md (notebook conventions, canonical DC bounds) updated.
Drafted notes/report-C.md from the existing notes/report-C-outline.md outline — 14 sections, ~800 lines of markdown. Mirrors Report A's chapter-grade prose style but for the experimental half of CS-1. Companion to notes/ch08-apparatus-outline.md (which documents the apparatus as a research tool for the dissertation's ch08-evaluation subsection).
Substantive findings derived from the existing data at data/results/experimental/<adp>_<fw>_<gr>/ (16 cells already produced):
-
F-C.1 Apparatus-error budget is small relative to measured scales. Calibration precision band (~450 µs total) is two orders of magnitude smaller than the response-time scales the 16-run grid measures (30–880 ms). The chapter's
gate.passed=Falseflag is documented as a limitation rather than a blocker. -
F-C.3 Predicted verdict is not testable at the model's design rate on this prototype. All 16 cells achieve
$X_0 \in [3.05, 45.61]$ req/s against the model's$\lambda_z = 345$ req/s. Calibration shows per-worker saturation at 456–616 req/s (above the design point), so the constraint is multi-hop topology times bounded driver concurrency, not raw single-worker throughput. R1 ranges 6.5–72.3 % (vs model 0.00 %), R2 ranges 30.7–876.2 ms (vs model ≤ 3.6 ms). Every cell fails R1 and R2 by wide margins; the rate-scale gap is the dominant explanation. -
F-C.4 Architectural invariance fails — granularity dominates framework. Granularity-pair
$\Delta$ R1 ranges +16 to +57 pp; framework-pair$\Delta$ R1 ranges +5 to +20 pp. Granularity is roughly$3\times$ framework on R1 and$5\times$ on R2. Expanding from one composite to thirteen services adds six HTTP hops per request; the per-hop queueing interference amplifies failure rate beyond what the model's per-call$\varepsilon_j$ predicts. - F-C.5 Cost-per-evidence is asymmetric. Report A's three model iterations produced 24 PASS verdicts at notebook cost; Report C's apparatus produced 16 FAIL verdicts plus the rate-scale finding at ~10$\times$ the wall-clock + engineering cost. The two methods are complementary cost regimes, not substitutes — the dissertation §7.3 cross-case comparison should fold this into "when each is the right tool".
- F-C.6 Where measured diverges from predicted, the gap is the finding. The 16-cell grid does not falsify Report A's predictions in the strict sense; it falsifies the testability of those predictions on this stack. Named follow-ups (remote deployment, UDS upgrade, rate-rescaling) are the engineering closure path.
Falsification verdict has three layers:
- L1: prediction not directly testable at design rate on this prototype.
- L2: framework + granularity invariance both falsified at the prototype's accessible rates.
- L3: qualitative preference-ordering between adaptations matches Report A's prediction, even where absolute values do not — partial confirmation.
Three named follow-ups (mirroring next-session TODOs):
- Remote deployment (stage 8.1, [[next-session-todos-2026-05-14]] item 2) — closes the rate-scale gap.
- Rate-rescaling sensitivity sweep — re-run Report A at
$\lambda_z = 50$ req/s for direct predicted-vs-measured comparison at matched rate (cheap analytical bridge). - Cross-case synthesis with CS-2 (IoT-SDP) at dissertation §7.3.
notes/report-C-outline.md was drafted earlier (other-agent scope, full chapter outline); notes/report-C.md is the body realisation. Two coordinated files now exist: outline (planning) + body (chapter draft). Companion file notes/ch08-apparatus-outline.md carries the parallel apparatus-documentation outline for the dissertation's ch08-evaluation subsection (separate scope: documents the apparatus as a research tool, not the falsification verdict).
- Does PyDASA 0.3.2 already expose the π-group builders this case study needs, or do we need helpers in local
src/? - Calibration notebook (
CS-01X) — keep as separate deliverable or fold intoCS-01A?
Tooling / bugfix session, no research findings.
- Multi-trial benchmark moved into the calibration notebook. The 16-cell x N-trial apparatus benchmark left
05-experimental.ipynband became section 5 of00-calibration.ipynb(it characterises the apparatus, sibling to calibration). Logic consolidated insrc/experimental/procedure/bench.py(run_bench/summarize_bench/save_bench_summary/profile_stages); plotters insrc/view/bench.py. Output: data underBENCH_DATA_DIR=data/results/calibration/bench/, figures underBENCH_IMG_DIR=data/img/calibration/bench/(both public constants).summarize_benchloads R1/R2 + designlambda_zfrom config and attaches them to the frame's.attrs; the plotters read thresholds + axes off the frame and carry no constants. Temp scripts_bench_16cells.py/_profile_344ms.py/_sweep_grace.pydeleted. methods/experimental.pyprivate re-exports stripped by an editor autofix. A ruff/autoflake "remove unused import" autofix-on-save deleted the test-monkeypatch private re-exports (_resolve_admission,_drive_trial, ...), causing 10AttributeErrortest failures. Fixed by listing every re-exported private name in__all__so linters treat them as used.cleanup_calibration_portscrashed the Jupyter kernel. It killed every process LISTENING on a port in the swept range (run_benchswept 8001-8049 + 9001-9049); a kernel ZMQ port landed in-range, soproc.kill()hit the kernel — surfaced as "The Kernel crashed", no traceback. Fixed to kill only descendants of the current process (workers we spawned); ancestors (kernel, VS Code server, shell) and strangers are never touched.- Bounds gate is per-dpl.
05-experimental.ipynbrunsDPL="multiprocess";run_experiment->_maybe_check_boundsreadsfind_latest_envelope("multiprocess"), whose measuredr_max_req_sis noisy run-to-run (136..590). A run that lands belowlambda_z=248raisesEnvelopeExceededError. The grid cell now passesskip_bounds_check=True, matchingexperimental.json::trial._note_ratedesign intent (drive the design rate, let undershoot land inverdict.operational.X_0_req_per_s).
Full pytest: 575 passed.
Tooling session, two apparatus changes. No research findings.
The collapsed run could not produce per-stage TAS_{2..6} data (those stages are workflow code inside TAS_1, not processes); expanded produces it but at the cost of five extra processes and six HTTP hops. collapsed_staged is the middle option: the collapsed mesh (TAS_1 + 7 third-party atomics, no extra processes) plus per-stage timing reconstructed post-hoc.
- Engine (
WorkflowEngine): newstage_route_lt(operation -> internal-stage id). When set,_dispatchbrackets each step — stampt_in, run the dispatch, stampt_out, append a syntheticWorkflowStepfor the internal stage spanning[t_in, t_out]. No mu-sleep: TAS internal stages are passthroughs (mu is only for third-party atomics), consistent withinject_internal_stage_mu=false. - The stage entries ride the existing audit path:
body.workflow.steps-> composite flow JSONLsteps. - Aggregator (
observed_nodes_from_run): newstaged_flows_path/staged_stage_ids._staged_stage_rowswalks the flow JSONL and derives TAS_{2..6} rows — A/C from the bracket count, W = mean span, lambda = A/T_s, L = lambda*W.mu/rhoare NaN andc/Kare None, because an internal stage is a passthrough, not an M/M/c/K station. target.json::workflowsmapscollapsed_staged -> tas(reuses the collapsed workflow file;stage_route_ltis built by invertingtarget.json::stage_routes)._op_to_stage_ltdoes the inversion.- Plain
collapsed/expandedare untouched:stage_route_ltis None for them, so no brackets are emitted. 05-experimental.ipynbcell 5 passesstaged_flows_pathwhen the run reportsstaged_stage_ids;GRANULARITIESstays["collapsed"](flip to opt in).
Previously w_proc scaled one process's (c, K) to (c*w, K*w) — which silently modelled w separate processes as one pooled M/M/(cw)/(Kw) station (shared queue). They are not: w processes each own a private event loop and a private admission queue (split queues). c=1, w_proc=2 is two independent single-server queues, not a c=2 station.
_w_procreturns the process count, clamped to[1, MAX_WORKERS](MAX_WORKERS = 8)._build_mesh_specs: every service (TAS_1, internal stages, atomics) spawnsw_procprocesses viaMeshSpec.workers. New_slot_urlsenumerates allw_procworker URLs inside the service'sPORT_STRIDEslot so the composite'sServiceClientround-robins across them.- Deleted
_effective_admission(the(c,K)*wmultiplier); each process keeps base(c, K).verdict.json::meshrows now carry{c, K, mu, eps, w_proc}(base(c,K)+ the worker count recorded separately) via the new_mesh_rowhelper. - Removed
target.json::tas_workers(and_note_tas_workers); per-servicew_proclives only in the profile specs layer.run_experimentlost itstas_workersparameter; the run summary reportstas_w_proc. - Profile specs (
dflt.json/opti.json): every specs nodeK16 -> 32;w_proc_{TAS_{1}}4 -> 2 (atomics + TAS_{2..6} alreadyw_proc=1,c=1). Artifacts layer untouched (c=1,K=16— the frozen Camara model). - With the confirmed config a collapsed run spawns TAS_1 x2 + 7 atomics x1 = 9 processes; expanded adds TAS_{2..6} x1 = 14.
Full pytest: 585 passed.