fix(local-gate): six targeted fixes to make local test gate reliable#407
fix(local-gate): six targeted fixes to make local test gate reliable#407lollonet wants to merge 1 commit into
Conversation
Pre-release-tag hardening pass. All fixes are minimal — no broad refactors. 1. scripts/common/unified-log.sh `printf '[%(%H:%M:%S)T]'` requires Bash 4.2+; macOS default is 3.2, silently no-ops there and produces log lines without timestamps. Replace with `$(date +%H:%M:%S)` — portable across Bash 3.2 and 5+, fork cost negligible vs. running the gate. 2. scripts/device-smoke.sh + tests/test_device_smoke.sh The smoke script runs real dig/chronyc/avahi-browse/arping plus journalctl + /proc reads + live HTTP probes against snapserver and metadata-service. On macOS / generic CI runners every one of those fails with no DNS / no mDNS responder / no Linux kernel — the test hung in timeout. Introduce `SMOKE_SKIP_NETWORK=1` as an escape hatch that gates: parallel net checks (dns/ntp/mdns/arping), live HTTP probes (Snapcast JSON-RPC, metadata /health, /status), the modular Linux-only checks (boot_health, mounts, qos, timers, system, audio, env, mdns, snapcast), and the journalctl recent-errors scan. Production runs (firstboot, fleet-smoke, ADR-005 release gate) leave the flag UNSET so behaviour is unchanged. Test mocks the remaining Linux-only binaries (journalctl, ip, ss, vcgencmd, tc, …) as no-ops. 3. tests/test_metadata_plugin_bugs.sh The disconnect-path assertion used `awk '/finally:/,/logger.info.*disconnected/'` which matches the FIRST `finally:` in metadata-service.py (line 665, inside `fetch_album_artwork`'s connection cleanup) — not `ws_handler`'s own finally at line 1919. False-positive: ws_handler could regress and the test would still pass. Scope the awk to `ws_handler()`'s body by bounding it between `^async def ws_handler` and the next top-level `def`/`async def`. Also assert the extracted block is non-empty (belt-and-braces). 4. client/tests/test_discover_server.sh The function was renamed from `_update_server` to `_apply_server` in an earlier refactor; the test kept extracting the old name and silently broke (6 FAILs, 2 PASSes). Rewrite to extract `_apply_server` + `_current_host`, stub `_log` and `_compose_up`, and assert the same invariants: no-change → return 1, otherwise write .env BEFORE invoking compose so the new SNAPSERVER_HOST is visible to the recreate, compose-failure preserves the updated .env, non-watch mode writes .env but doesn't restart. 13/13 PASS. 5. scripts/deploy.sh `install_systemd_service` ran `docker compose down >/dev/null 2>&1` before `systemctl start` to ensure ExecStart picks up the just-written .env. That's destructive: 30-40 s of audio silence in `both` mode plus a full network/image teardown. Replace with `docker compose up -d --force-recreate` which re-evaluates compose + env, recreates only what actually changed, no-ops otherwise. Typical cost 5-10 s. The `|| true` keeps deploy.sh resilient when no compose project exists yet (fresh install). The systemd unit's own ExecStartPre mem-drift guard covers a separate edge (cgroup v2 not active at first create); this line covers the .env-change path. 6. client/common/scripts/setup.sh `curl -fsSL https://get.docker.com | sh` as a last-resort fallback when install-docker.sh isn't found is weak — pipes an unverified script from the internet into root sh. prepare-sd.sh ships install-docker.sh on every SD layout, so its absence means the SD card was prepared wrong. Replace with explicit failure listing the candidate paths it checked and the recovery action (re-run prepare-sd.sh). Failing loud is safer than silently piping. Test-suite delta after fixes (excluding test_compose_health.sh per CI): - Server: 50 → 51 PASS, 0 FAIL (was 49 PASS + 1 FAIL = test_device_smoke on timeout. Now passes deterministically.) - Client: 5/5 PASS (test_discover_server.sh was failing pre-fix, now green.) - Pytest: 177/177 PASS (no change, but verified unaffected). Test_device_smoke_parallel_network.sh also updated to tolerate the new indentation under the SMOKE_SKIP_NETWORK guard. No CHANGELOG entry — parallel-branch rule, accumulated for v0.7.5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ready to merge — no blockers. Three minor nits below.FindingsLOW — Duplicate skip message in When Suggested fix: remove the pre-section -# Recent-errors journalctl scan is also Linux-only; same gate.
-if [[ "${SMOKE_SKIP_NETWORK:-0}" == "1" ]]; then
- info "SMOKE_SKIP_NETWORK=1 — journalctl recent-error scan skipped"
-fi
-
section "Recent Errors"LOW — Non-watch mode test doesn't assert compose was NOT called The Suggested addition inside if [[ "$watch_mode" == "false" ]]; then
assert_eq "" "$restarted_with" \
"$desc: non-watch mode must not invoke compose up"
fiLOW — Multi-line comment blocks exceed CLAUDE.md convention CLAUDE.md states: "Never write multi-paragraph docstrings or multi-line comment blocks — one short line max." Several comment blocks added in this PR run 5–7 lines. The WHY in each is valid, but the verbosity is excessive by project convention. Checklist
|
Sintesi
Hardening pre-tag release: 6 fix mirate per rendere il local test gate affidabile su macOS / CI generici. Nessun refactor ampio, ciascun fix è minimal e verificato.
Files modificati
scripts/common/unified-log.shscripts/device-smoke.shtests/test_device_smoke.shtests/test_device_smoke_parallel_network.shtests/test_metadata_plugin_bugs.shclient/tests/test_discover_server.shscripts/deploy.shclient/common/scripts/setup.shFix dettagliate
1. Logging Bash 3.2
unified-log.sh:103usavaprintf '[%(%H:%M:%S)T]'(builtin Bash 4.2+). Su macOS default (Bash 3.2) la format string esce letterale → log lines senza timestamp. Sostituito con\$(date +%H:%M:%S). Fork cost trascurabile vs il costo del gate.2. device-smoke test deterministico
Lo script eseguiva real DNS/NTP/mDNS/arping + journalctl + /proc reads + HTTP probes contro snapserver/metadata-service. Su macOS CI il test hangava in timeout. Introdotto
SMOKE_SKIP_NETWORK=1che gata:_net_check_*)journalctlRecent ErrorsProduction runs (firstboot, fleet-smoke, ADR-005) lasciano la flag UNSET → comportamento invariato. Test mocka i remaining Linux-only binaries (journalctl, ip, ss, vcgencmd, tc, ...) come no-op.
3. metadata-service test falso positivo
test_metadata_plugin_bugs.sh:88usavaawk '/finally:/,/logger.info.*disconnected/'che matcha il primofinally:del file (riga 665 infetch_album_artwork), non quello diws_handler(riga 1919). Falso positivo: ws_handler poteva regredire e il test ancora passare. Fix: scope l'awk al body diws_handlerbound tra^async def ws_handlere il prossimo top-level def. Aggiunta belt-and-braces che il blocco estratto sia non-vuoto.4. discover-server test obsoleto
La funzione fu rinominata da
_update_servera_apply_serverin un refactor precedente; il test continuava ad estrarre il nome vecchio e fallava silenziosamente (6 FAIL / 2 PASS). Riscritto per estrarre_apply_server+_current_host, stub_loge_compose_up. Mantiene gli assert su ordering (.env flushed BEFORE compose up). 13/13 PASS.5. deploy lifecycle
install_systemd_servicefacevadocker compose downprima disystemctl startper assicurarsi che ExecStart leggesse il nuovo.env. Distruttivo: 30-40s silenzio audio + teardown network/image. Sostituito condocker compose up -d --force-recreate: re-evaluates compose+env, ricrea solo ciò che è cambiato, no-op altrimenti. Costo tipico 5-10s. Il|| truemantiene resilienza quando il compose project non esiste ancora (fresh install).6. Security: no curl|sh fallback
client/common/scripts/setup.sh:541avevacurl -fsSL https://get.docker.com | shcome last-resort. prepare-sd.sh shipinstall-docker.shsu ogni SD layout, quindi la sua assenza significa SD malpreparata. Sostituito con failure esplicito che elenca i candidate paths controllati + recovery action (re-run prepare-sd.sh). Fail loud > pipe silenzioso.Validation locale
bash -nsu tutti i file modificatishellcheck -S warning -xsu tutti i file modificatitests/test_logging_consolidation.shtests/test_device_smoke.shtests/test_metadata_plugin_bugs.shclient/tests/test_discover_server.shtests/test_prepare_sd_required_files.sh.venv/bin/python -m pytest tests client/tests -qRischi residui
SMOKE_LOCAL_GATE=1oSMOKE_PORTABLE=1deferibile se il nome diventa fuorviante (out of scope qui).install-docker.shmanualmente o pre-installare Docker. Trade-off accettato per sicurezza.No CHANGELOG entry — parallel-branch rule, accumulato per release-tag v0.7.5.