bug: @@assert BamlValidationError discards assertion failure details — causes always empty

## 1. Why This Is Important

We use BAML `@@assert` constraints to enforce structural invariants on LLM-generated medical food safety data (array length parity, score ranges, character limits). When an assertion fails, `BamlValidationError` says `"Assertions failed."` but provides **zero information about which assertion failed**. The `causes` array in the internal `ParsingError` is always empty.

**Business impact:** We process ~10,000 tray images per batch run. At a 0.13% assertion failure rate, that's ~13 failures per run. Each failure requires manual investigation because the error message doesn't identify the failing assertion. With 8 assertions per class, the debugging surface is 8x larger than it needs to be. At production scale (multiple facilities, daily runs), this becomes untenable.

**Secondary issue:** In at least one case, the raw LLM output **satisfies all defined assertions** when manually evaluated, yet BAML still raises `BamlValidationError`. This suggests either: (a) assertions are evaluated against an intermediate representation that differs from the final coerced output, or (b) there's a false-positive path in the assertion evaluation pipeline.

**Downstream systems affected:** Any BAML user relying on `@@assert` for validation in high-volume pipelines. The error is non-actionable without code-level debugging.

---

## 2. Severity Classification

**`severity:medium`** — Feature partially broken. `@@assert` *works* (it catches violations), but the error reporting is non-functional. Users cannot determine which assertion failed without manual re-evaluation of all assertions against the raw output. The potential false-positive path (second issue) elevates this from low to medium.

---

## 3. Files Impacted

| File | Status | Role |
|------|--------|------|
| `engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs` | **Directly broken** | `validate_asserts()` builds per-assertion failure details then discards them |
| `engine/baml-runtime/src/errors.rs` | **Structurally limited** | `ExposedError::ValidationError` has no field for structured assertion failure data |
| `engine/language_client_python/src/errors.rs` | **Affected** | `raise_baml_validation_error()` only passes four strings — no structured causes |
| `engine/baml-lib/jsonish/src/deserializer/coercer/ir_ref/coerce_class.rs` | **Requires verification** | Class-level `@@assert` evaluation — may explain false-positive path |

---

## 4. Relevant Code (Verbatim)

### `validate_asserts` — failure details are built then discarded

**File:** `engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs`

```rust
pub fn validate_asserts(constraints: &[(Constraint, bool)]) -> Result<(), ParsingError> {
    let failing_asserts = constraints
        .iter()
        .filter_map(
            |(Constraint { level, expression, label }, result)| {
                if !result && ConstraintLevel::Assert == *level {
                    Some((label, expression))
                } else {
                    None
                }
            },
        )
        .collect::<Vec<_>>();
    let causes = failing_asserts
        .into_iter()
        .map(|(label, expr)| ParsingError {
            causes: vec![],
            reason: format!(
                "Failed: {}{}",
                label.as_ref().map_or("".to_string(), |l| format!("{l} ")),
                expr.0
            ),
            scope: vec![],
        })
        .collect::<Vec<_>>();
    if !causes.is_empty() {
        Err(ParsingError {
            causes: vec![],  // <-- per-assertion details discarded here
            reason: "Assertions failed.".to_string(),
            // IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG.
            scope: vec![],
        })
    } else {
        Ok(())
    }
}
```

The function computes a `causes` Vec containing one `ParsingError` per failing assertion (with label and expression), then returns `ParsingError { causes: vec![], ... }` — discarding the diagnostics entirely.

### `ExposedError::ValidationError` — no structured cause field

**File:** `engine/baml-runtime/src/errors.rs`

```rust
pub enum ExposedError {
    /// Error in parsing post calling the LLM
    ValidationError {
        prompt: String,
        raw_output: String,
        message: String,
        detailed_message: String,
    },
    // ...
}
```

Only four string fields. No way to carry structured assertion failure data to the Python/TypeScript client.

### Python bridge — four strings only

**File:** `engine/language_client_python/src/errors.rs`

```rust
fn raise_baml_validation_error(
    prompt: String,
    message: String,
    raw_output: String,
    detailed_message: String,
) -> PyErr {
    Python::with_gil(|py| {
        let internal_monkeypatch = py.import("baml_py.internal_monkeypatch").unwrap();
        let exception = internal_monkeypatch.getattr("BamlValidationError").unwrap();
        let args = (prompt, message, raw_output, detailed_message);
        let inst = exception.call1(args).unwrap();
        PyErr::from_value(inst)
    })
}
```

### Our BAML schema — 8 assertions, all pass on the "failing" output

**File:** (our project) `baml_src/types/tray_analysis.baml`

```baml
class TrayAnalysis {
  // ... fields ...
  @@assert(valid_score, {{ this.assessment_score >= 0 and this.assessment_score <= 10 }})
  @@assert(bullets_count, {{ this.bullets_review_reason|length >= 1 and this.bullets_review_reason|length <= 5 }})
  @@assert(verbal_length, {{ this.verbal_review_reason|length <= 150 }})
  @@assert(ticket_pair_length, {{ this.meal_ticket_items|length == this.meal_ticket_confidence|length }})
  @@assert(scanned_pair_length, {{ this.scanned_items|length == this.scanned_confidence|length }})
  @@assert(disc_missing_pair_length, {{ this.discrepancies.missing|length == this.discrepancy_confidence.missing|length }})
  @@assert(disc_unexpected_pair_length, {{ this.discrepancies.unexpected|length == this.discrepancy_confidence.unexpected|length }})
  @@assert(valid_legibility, {{ this.meal_ticket_legibility_pct >= 0 and this.meal_ticket_legibility_pct <= 100 }})
}
```

Manual evaluation of the raw output against all 8 assertions: **all pass**. (assessment_score=7, bullets=3, verbal=89 chars, ticket items/confidence both length 7, scanned items/confidence both length 6, missing disc/confidence both length 1, unexpected disc/confidence both length 0, legibility=100.0.)

---

## 5. What Needs to Be Troubleshot (and Why)

### Issue A: Discarded assertion failure details

The `causes` variable in `validate_asserts()` contains exactly the information users need — which assertion failed and what expression evaluated to false. But the return value uses `causes: vec![]`. This appears intentional (see the `// IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG.` comment), but the rationale isn't documented.

**Question for the team:** Is there a downstream consumer that pattern-matches on `"Assertions failed."` with empty causes? If so, could causes be added alongside the existing message without breaking that contract?

### Issue B: Possible false-positive assertion failure

The raw output satisfies all 8 assertions when evaluated manually. Three hypotheses:

1. **Field name mismatch during evaluation:** The LLM returned `"meal_confidence"` instead of `"meal_ticket_confidence"`. BAML's jsonish parser may rename fields during coercion, but if `@@assert` evaluates before the rename completes, `this.meal_ticket_confidence` could resolve to `undefined` or an empty array, causing `ticket_pair_length` to fail.

2. **Jinja evaluation context differs from final output:** The `run_user_checks` function evaluates constraints against `&BamlValue`. If the `BamlValue` tree differs from the final JSON serialization (e.g., intermediate coercion state), assertions could fail on data that appears valid in the `raw_output` string.

3. **Non-deterministic evaluation:** Unlikely given post-hoc parsing, but worth ruling out.

### Validation needed

- Add the per-assertion `causes` to the returned `ParsingError` and verify which assertion actually failed.
- Check whether `run_user_checks` receives the pre-rename or post-rename field values.

---

## 6. Steps to Reproduce

### Issue A (empty causes) — always reproducible

1. **Environment:** Python 3.11.10, baml-py 0.220.0, macOS Darwin 25.3.0
2. Define a BAML class with any `@@assert` constraint
3. Feed it LLM output that violates the assertion
4. Catch `BamlValidationError`
5. **Observed:** `message` contains `"Assertions failed."` with no indication of which assertion
6. **Expected:** Error should identify the failing assertion by label and expression

### Issue B (false positive) — intermittent

1. **Environment:** Same as above, Google Gemini 3.1 Flash Lite Preview
2. BAML class with 8 `@@assert` constraints (see Section 4)
3. Process ~10,000 images through `AnalyzeTray` function
4. ~0.13% of responses raise `BamlValidationError` despite `raw_output` satisfying all assertions
5. Same image succeeds on retry ~75% of the time (9 of 12 recovered after one retry)
6. One image failed 3 consecutive attempts despite valid-looking output each time
7. **Observed:** `BamlValidationError` with `"Assertions failed."` on structurally valid output
8. **Expected:** No error, or error identifying specifically which assertion failed

---

## 7. Proposed Direction

### Fix A: Populate `causes` in `validate_asserts` return value

```rust
// Current:
Err(ParsingError {
    causes: vec![],  // discarded
    reason: "Assertions failed.".to_string(),
    scope: vec![],
})

// Proposed:
Err(ParsingError {
    causes,  // the already-computed per-assertion failures
    reason: "Assertions failed.".to_string(),
    scope: vec![],
})
```

This is a one-line change. The `causes` variable already contains the right data.

### Fix B: Surface causes in `BamlValidationError`

Add a `failed_assertions` field (or similar) to `ExposedError::ValidationError` and propagate to the Python/TypeScript exception. Structured data (assertion label, expression, evaluated value) would allow programmatic handling.

### Fix C: Investigate false-positive path

Requires instrumentation in `coerce_class.rs` to log the `BamlValue` passed to `run_user_checks` alongside the raw output, for cases where assertions fail but the output appears valid.

### Risk assessment

- Fix A is low-risk — it's using data that's already computed and just not passed through.
- Fix B requires API changes to the error class across Python/TypeScript/Ruby clients.
- Fix C is diagnostic only.

---

## 8. Documentation & Citations

### Citation 1 — `@@assert` failure behavior

- **File:** `engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs`
- **Change Summary:** `validate_asserts()` should propagate per-assertion failure details instead of discarding them
- **Verbatim Quote:** "When asserts fail, your BAML function will raise a `BamlValidationError` exception, same as when parsing fails." and "You can define custom names for each assertion, which will be included in the exception for that failure case."
- **Source URL:** https://docs.boundaryml.com/guide/baml-advanced/checks-and-asserts
- **Justification:** The docs promise custom assertion names "will be included in the exception" — but the current implementation discards them. This is a doc-behavior mismatch.

### Citation 2 — `BamlValidationError` has only four string fields

- **File:** `engine/baml-runtime/src/errors.rs`
- **Change Summary:** `ExposedError::ValidationError` needs a structured causes field
- **Verbatim Quote:** (BamlValidationError attributes) `message`, `prompt`, `raw_output`, `detailed_message`
- **Source URL:** https://docs.boundaryml.com/ref/baml_client/errors/baml-validation-error
- **Justification:** No field exists for structured assertion failure data. Users cannot determine which assertion failed programmatically.

### Citation 3 — Known assertion evaluation bug with nullable types

- **File:** `engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs`
- **Change Summary:** Precedent for assertion evaluation bugs — Issue #1962 showed assertions could fail on valid data due to type coercion issues
- **Verbatim Quote:** "Regression relative to 0.82. Assert evaluation fails on some nullable fields: `Error: content: Failed to evaluate assert: Could not unify Null with WithMetadata`"
- **Source URL:** https://github.com/BoundaryML/baml/issues/1962
- **Justification:** Establishes precedent that assertion evaluation can produce false positives due to type handling bugs. Our false-positive observation may stem from a similar class of issue.

### Citation 4 — BAML does not use constrained decoding

- **File:** `engine/baml-runtime/src/internal/llm_client/primitive/google/googleai_client.rs`
- **Change Summary:** Assertions evaluate post-hoc, not during generation
- **Verbatim Quote:** "Parsing the LLM's free-form output... enables you to retain that output quality"
- **Source URL:** https://boundaryml.com/blog/structured-outputs-create-false-confidence
- **Justification:** Confirms assertions are a post-parsing step. False positives must originate in the parsing/coercion pipeline, not in the LLM generation.

### Citation 5 — Assert improvement commit history

- **File:** `engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs`
- **Change Summary:** The `// IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG.` comment suggests deliberate design choice
- **Verbatim Quote:** CHANGELOG entry: "improve error message for asserts and checks (#1975)" by Greg Hale (commit `070ad26`)
- **Source URL:** https://github.com/BoundaryML/baml/commit/070ad26
- **Justification:** Shows the assertion error reporting was actively worked on but the causes-discarding behavior was preserved. Understanding the intent behind this decision is necessary before changing it.

---

**Environment:**
- baml-py: 0.220.0
- Python: 3.11.10
- OS: macOS Darwin 25.3.0 (arm64)
- Provider: Google Gemini 3.1 Flash Lite Preview (via BAML google-ai provider)
- Volume: ~10,000 LLM calls per batch, 0.13% assertion failure rate

File	Status	Role
`engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs`	Directly broken	`validate_asserts()` builds per-assertion failure details then discards them
`engine/baml-runtime/src/errors.rs`	Structurally limited	`ExposedError::ValidationError` has no field for structured assertion failure data
`engine/language_client_python/src/errors.rs`	Affected	`raise_baml_validation_error()` only passes four strings — no structured causes
`engine/baml-lib/jsonish/src/deserializer/coercer/ir_ref/coerce_class.rs`	Requires verification	Class-level `@@assert` evaluation — may explain false-positive path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: @@assert BamlValidationError discards assertion failure details — causes always empty #3289

1. Why This Is Important

2. Severity Classification

3. Files Impacted

4. Relevant Code (Verbatim)

`validate_asserts` — failure details are built then discarded

`ExposedError::ValidationError` — no structured cause field

Python bridge — four strings only

Our BAML schema — 8 assertions, all pass on the "failing" output

5. What Needs to Be Troubleshot (and Why)

Issue A: Discarded assertion failure details

Issue B: Possible false-positive assertion failure

Validation needed

6. Steps to Reproduce

Issue A (empty causes) — always reproducible

Issue B (false positive) — intermittent

7. Proposed Direction

Fix A: Populate `causes` in `validate_asserts` return value

Fix B: Surface causes in `BamlValidationError`

Fix C: Investigate false-positive path

Risk assessment

8. Documentation & Citations

Citation 1 — `@@assert` failure behavior

Citation 2 — `BamlValidationError` has only four string fields

Citation 3 — Known assertion evaluation bug with nullable types

Citation 4 — BAML does not use constrained decoding

Citation 5 — Assert improvement commit history

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: @@assert BamlValidationError discards assertion failure details — causes always empty #3289

Description

1. Why This Is Important

2. Severity Classification

3. Files Impacted

4. Relevant Code (Verbatim)

validate_asserts — failure details are built then discarded

ExposedError::ValidationError — no structured cause field

Python bridge — four strings only

Our BAML schema — 8 assertions, all pass on the "failing" output

5. What Needs to Be Troubleshot (and Why)

Issue A: Discarded assertion failure details

Issue B: Possible false-positive assertion failure

Validation needed

6. Steps to Reproduce

Issue A (empty causes) — always reproducible

Issue B (false positive) — intermittent

7. Proposed Direction

Fix A: Populate causes in validate_asserts return value

Fix B: Surface causes in BamlValidationError

Fix C: Investigate false-positive path

Risk assessment

8. Documentation & Citations

Citation 1 — @@assert failure behavior

Citation 2 — BamlValidationError has only four string fields

Citation 3 — Known assertion evaluation bug with nullable types

Citation 4 — BAML does not use constrained decoding

Citation 5 — Assert improvement commit history

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`validate_asserts` — failure details are built then discarded

`ExposedError::ValidationError` — no structured cause field

Fix A: Populate `causes` in `validate_asserts` return value

Fix B: Surface causes in `BamlValidationError`

Citation 1 — `@@assert` failure behavior

Citation 2 — `BamlValidationError` has only four string fields