Skip to content

Comments

[EXPERIMENTAL] feat(tracer): add TLA+ formal verification of concurrency invariants#4426

Draft
kakkoyun wants to merge 4 commits intoDataDog:mainfrom
kakkoyun:kakkoyun/specula-formal-verification
Draft

[EXPERIMENTAL] feat(tracer): add TLA+ formal verification of concurrency invariants#4426
kakkoyun wants to merge 4 commits intoDataDog:mainfrom
kakkoyun:kakkoyun/specula-formal-verification

Conversation

@kakkoyun
Copy link
Member

@kakkoyun kakkoyun commented Feb 17, 2026

WIP: This still requires a lot work and validation

Using https://github.com/specula-org/Specula

What does this PR do?

Adds TLA+ formal verification for two critical concurrency protocols in dd-trace-go:

  • Phase 1 — Span Lifecycle: Verifies lock ordering (span.mutrace.mu), finish-guard pattern, partial flush lock inversion safety (#incident-46344), and deadlock freedom. TLC explored 4,437 distinct states across 2 goroutines and 3 spans with all 6 invariants passing.
  • Phase 2 — GLS Push/Pop: Verifies push/pop pairing, stack depth correctness, no-leak-on-finish, LIFO ordering, and goroutine isolation. TLC explored 159 distinct states with all 8 invariants passing. Directly complements the kakkoyun/orchestrion_gls_leak branch work.

Includes:

  • Hand-written TLA+ specs with TLC model configurations
  • Extracted Go source files preserving only concurrency-relevant protocol
  • Specula pipeline scripts and configuration for automated Go→TLA+ translation
  • CI workflow running TLC model checker on every change to formal-verification/
  • Documentation covering background, setup, model explanation, and result interpretation

Motivation

dd-trace-go's tracer has complex concurrency invariants (lock ordering, finish-guard, GLS stack pairing) that are currently enforced only by code comments, runtime assertions, and -race testing. The checklocks CI job is disabled pending annotation coverage. Formal verification with TLA+/TLC mathematically proves these invariants hold for all possible interleavings — not just those exercised by tests.

This is experimental infrastructure to explore whether Specula-based formal verification is practical for dd-trace-go's concurrency model.

Related: #4384 (orchestrion GLS leak fix)

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • New code is free of linting errors. You can check this by running make lint locally.
  • New code doesn't break existing tests. You can check this by running make test locally.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • All generated files are up to date. You can check this by running make generate locally.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running make fix-modules locally.

Unsure? Have a question? Request a review!

Adds Specula-based formal verification for two critical concurrency
protocols in dd-trace-go:

Phase 1 — Span Lifecycle: verifies lock ordering (span.mu → trace.mu),
finish-guard pattern, partial flush lock inversion safety, and deadlock
freedom. TLC explored 4,437 distinct states across 2 goroutines and
3 spans with all 6 invariants passing.

Phase 2 — GLS Push/Pop: verifies push/pop pairing, stack depth
correctness, no-leak-on-finish, LIFO ordering, and goroutine isolation.
TLC explored 159 distinct states with all 8 invariants passing.
Complements the kakkoyun/orchestrion_gls_leak branch work.

Includes extracted Go source files, hand-written TLA+ specs with TLC
configs, Specula pipeline scripts, and documentation.
Runs TLC model checker on both specs (span lifecycle and GLS push/pop)
when formal-verification/ files change. Validates Go source extraction
compiles. Uses pinned action SHAs and cached tla2tools.jar.
- actions/setup-java v4.7.1 -> v5.2.0
- actions/cache v4.2.3 -> v5.0.3
- actions/setup-go v5.5.0 -> v6.2.0
@kakkoyun kakkoyun changed the title [EXPERIMENTAL] feat(tracer): add TLA+ formal verification of concurrency invariants feat(tracer): add TLA+ formal verification of concurrency invariants Feb 17, 2026
@kakkoyun kakkoyun changed the title feat(tracer): add TLA+ formal verification of concurrency invariants [EXPERIMENTAL] feat(tracer): add TLA+ formal verification of concurrency invariants Feb 17, 2026
- Add Apache 2.0 copyright headers to extracted Go source files
- Fix gofmt comment formatting (3-space → 2-space numbered lists)
- Bump go.mod directive to go 1.25.0 to match CI toolchain
- Fix tla2tools.jar verification (tlc2.TLC -h exits non-zero)
@kakkoyun kakkoyun added the AI Assisted AI/LLM assistance used in this PR (partially or fully) label Feb 17, 2026
@felixge felixge self-requested a review February 17, 2026 19:50
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ping me for approval once this is ready to look at.

@kakkoyun
Copy link
Member Author

Please ping me for approval once this is ready to look at.

Of course. But it is far from ready :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Assisted AI/LLM assistance used in this PR (partially or fully)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants