Skip to content

bug: @@assert BamlValidationError discards assertion failure details — causes always empty #3289

@RobertdWray

Description

@RobertdWray

1. Why This Is Important

We use BAML @@assert constraints to enforce structural invariants on LLM-generated medical food safety data (array length parity, score ranges, character limits). When an assertion fails, BamlValidationError says "Assertions failed." but provides zero information about which assertion failed. The causes array in the internal ParsingError is always empty.

Business impact: We process ~10,000 tray images per batch run. At a 0.13% assertion failure rate, that's ~13 failures per run. Each failure requires manual investigation because the error message doesn't identify the failing assertion. With 8 assertions per class, the debugging surface is 8x larger than it needs to be. At production scale (multiple facilities, daily runs), this becomes untenable.

Secondary issue: In at least one case, the raw LLM output satisfies all defined assertions when manually evaluated, yet BAML still raises BamlValidationError. This suggests either: (a) assertions are evaluated against an intermediate representation that differs from the final coerced output, or (b) there's a false-positive path in the assertion evaluation pipeline.

Downstream systems affected: Any BAML user relying on @@assert for validation in high-volume pipelines. The error is non-actionable without code-level debugging.


2. Severity Classification

severity:medium — Feature partially broken. @@assert works (it catches violations), but the error reporting is non-functional. Users cannot determine which assertion failed without manual re-evaluation of all assertions against the raw output. The potential false-positive path (second issue) elevates this from low to medium.


3. Files Impacted

File Status Role
engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs Directly broken validate_asserts() builds per-assertion failure details then discards them
engine/baml-runtime/src/errors.rs Structurally limited ExposedError::ValidationError has no field for structured assertion failure data
engine/language_client_python/src/errors.rs Affected raise_baml_validation_error() only passes four strings — no structured causes
engine/baml-lib/jsonish/src/deserializer/coercer/ir_ref/coerce_class.rs Requires verification Class-level @@assert evaluation — may explain false-positive path

4. Relevant Code (Verbatim)

validate_asserts — failure details are built then discarded

File: engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs

pub fn validate_asserts(constraints: &[(Constraint, bool)]) -> Result<(), ParsingError> {
    let failing_asserts = constraints
        .iter()
        .filter_map(
            |(Constraint { level, expression, label }, result)| {
                if !result && ConstraintLevel::Assert == *level {
                    Some((label, expression))
                } else {
                    None
                }
            },
        )
        .collect::<Vec<_>>();
    let causes = failing_asserts
        .into_iter()
        .map(|(label, expr)| ParsingError {
            causes: vec![],
            reason: format!(
                "Failed: {}{}",
                label.as_ref().map_or("".to_string(), |l| format!("{l} ")),
                expr.0
            ),
            scope: vec![],
        })
        .collect::<Vec<_>>();
    if !causes.is_empty() {
        Err(ParsingError {
            causes: vec![],  // <-- per-assertion details discarded here
            reason: "Assertions failed.".to_string(),
            // IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG.
            scope: vec![],
        })
    } else {
        Ok(())
    }
}

The function computes a causes Vec containing one ParsingError per failing assertion (with label and expression), then returns ParsingError { causes: vec![], ... } — discarding the diagnostics entirely.

ExposedError::ValidationError — no structured cause field

File: engine/baml-runtime/src/errors.rs

pub enum ExposedError {
    /// Error in parsing post calling the LLM
    ValidationError {
        prompt: String,
        raw_output: String,
        message: String,
        detailed_message: String,
    },
    // ...
}

Only four string fields. No way to carry structured assertion failure data to the Python/TypeScript client.

Python bridge — four strings only

File: engine/language_client_python/src/errors.rs

fn raise_baml_validation_error(
    prompt: String,
    message: String,
    raw_output: String,
    detailed_message: String,
) -> PyErr {
    Python::with_gil(|py| {
        let internal_monkeypatch = py.import("baml_py.internal_monkeypatch").unwrap();
        let exception = internal_monkeypatch.getattr("BamlValidationError").unwrap();
        let args = (prompt, message, raw_output, detailed_message);
        let inst = exception.call1(args).unwrap();
        PyErr::from_value(inst)
    })
}

Our BAML schema — 8 assertions, all pass on the "failing" output

File: (our project) baml_src/types/tray_analysis.baml

class TrayAnalysis {
  // ... fields ...
  @@assert(valid_score, {{ this.assessment_score >= 0 and this.assessment_score <= 10 }})
  @@assert(bullets_count, {{ this.bullets_review_reason|length >= 1 and this.bullets_review_reason|length <= 5 }})
  @@assert(verbal_length, {{ this.verbal_review_reason|length <= 150 }})
  @@assert(ticket_pair_length, {{ this.meal_ticket_items|length == this.meal_ticket_confidence|length }})
  @@assert(scanned_pair_length, {{ this.scanned_items|length == this.scanned_confidence|length }})
  @@assert(disc_missing_pair_length, {{ this.discrepancies.missing|length == this.discrepancy_confidence.missing|length }})
  @@assert(disc_unexpected_pair_length, {{ this.discrepancies.unexpected|length == this.discrepancy_confidence.unexpected|length }})
  @@assert(valid_legibility, {{ this.meal_ticket_legibility_pct >= 0 and this.meal_ticket_legibility_pct <= 100 }})
}

Manual evaluation of the raw output against all 8 assertions: all pass. (assessment_score=7, bullets=3, verbal=89 chars, ticket items/confidence both length 7, scanned items/confidence both length 6, missing disc/confidence both length 1, unexpected disc/confidence both length 0, legibility=100.0.)


5. What Needs to Be Troubleshot (and Why)

Issue A: Discarded assertion failure details

The causes variable in validate_asserts() contains exactly the information users need — which assertion failed and what expression evaluated to false. But the return value uses causes: vec![]. This appears intentional (see the // IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG. comment), but the rationale isn't documented.

Question for the team: Is there a downstream consumer that pattern-matches on "Assertions failed." with empty causes? If so, could causes be added alongside the existing message without breaking that contract?

Issue B: Possible false-positive assertion failure

The raw output satisfies all 8 assertions when evaluated manually. Three hypotheses:

  1. Field name mismatch during evaluation: The LLM returned "meal_confidence" instead of "meal_ticket_confidence". BAML's jsonish parser may rename fields during coercion, but if @@assert evaluates before the rename completes, this.meal_ticket_confidence could resolve to undefined or an empty array, causing ticket_pair_length to fail.

  2. Jinja evaluation context differs from final output: The run_user_checks function evaluates constraints against &BamlValue. If the BamlValue tree differs from the final JSON serialization (e.g., intermediate coercion state), assertions could fail on data that appears valid in the raw_output string.

  3. Non-deterministic evaluation: Unlikely given post-hoc parsing, but worth ruling out.

Validation needed

  • Add the per-assertion causes to the returned ParsingError and verify which assertion actually failed.
  • Check whether run_user_checks receives the pre-rename or post-rename field values.

6. Steps to Reproduce

Issue A (empty causes) — always reproducible

  1. Environment: Python 3.11.10, baml-py 0.220.0, macOS Darwin 25.3.0
  2. Define a BAML class with any @@assert constraint
  3. Feed it LLM output that violates the assertion
  4. Catch BamlValidationError
  5. Observed: message contains "Assertions failed." with no indication of which assertion
  6. Expected: Error should identify the failing assertion by label and expression

Issue B (false positive) — intermittent

  1. Environment: Same as above, Google Gemini 3.1 Flash Lite Preview
  2. BAML class with 8 @@assert constraints (see Section 4)
  3. Process ~10,000 images through AnalyzeTray function
  4. ~0.13% of responses raise BamlValidationError despite raw_output satisfying all assertions
  5. Same image succeeds on retry ~75% of the time (9 of 12 recovered after one retry)
  6. One image failed 3 consecutive attempts despite valid-looking output each time
  7. Observed: BamlValidationError with "Assertions failed." on structurally valid output
  8. Expected: No error, or error identifying specifically which assertion failed

7. Proposed Direction

Fix A: Populate causes in validate_asserts return value

// Current:
Err(ParsingError {
    causes: vec![],  // discarded
    reason: "Assertions failed.".to_string(),
    scope: vec![],
})

// Proposed:
Err(ParsingError {
    causes,  // the already-computed per-assertion failures
    reason: "Assertions failed.".to_string(),
    scope: vec![],
})

This is a one-line change. The causes variable already contains the right data.

Fix B: Surface causes in BamlValidationError

Add a failed_assertions field (or similar) to ExposedError::ValidationError and propagate to the Python/TypeScript exception. Structured data (assertion label, expression, evaluated value) would allow programmatic handling.

Fix C: Investigate false-positive path

Requires instrumentation in coerce_class.rs to log the BamlValue passed to run_user_checks alongside the raw output, for cases where assertions fail but the output appears valid.

Risk assessment

  • Fix A is low-risk — it's using data that's already computed and just not passed through.
  • Fix B requires API changes to the error class across Python/TypeScript/Ruby clients.
  • Fix C is diagnostic only.

8. Documentation & Citations

Citation 1 — @@assert failure behavior

  • File: engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs
  • Change Summary: validate_asserts() should propagate per-assertion failure details instead of discarding them
  • Verbatim Quote: "When asserts fail, your BAML function will raise a BamlValidationError exception, same as when parsing fails." and "You can define custom names for each assertion, which will be included in the exception for that failure case."
  • Source URL: https://docs.boundaryml.com/guide/baml-advanced/checks-and-asserts
  • Justification: The docs promise custom assertion names "will be included in the exception" — but the current implementation discards them. This is a doc-behavior mismatch.

Citation 2 — BamlValidationError has only four string fields

  • File: engine/baml-runtime/src/errors.rs
  • Change Summary: ExposedError::ValidationError needs a structured causes field
  • Verbatim Quote: (BamlValidationError attributes) message, prompt, raw_output, detailed_message
  • Source URL: https://docs.boundaryml.com/ref/baml_client/errors/baml-validation-error
  • Justification: No field exists for structured assertion failure data. Users cannot determine which assertion failed programmatically.

Citation 3 — Known assertion evaluation bug with nullable types

  • File: engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs
  • Change Summary: Precedent for assertion evaluation bugs — Issue [bug] Assert evaluation fails on some nullable fields #1962 showed assertions could fail on valid data due to type coercion issues
  • Verbatim Quote: "Regression relative to 0.82. Assert evaluation fails on some nullable fields: Error: content: Failed to evaluate assert: Could not unify Null with WithMetadata"
  • Source URL: [bug] Assert evaluation fails on some nullable fields #1962
  • Justification: Establishes precedent that assertion evaluation can produce false positives due to type handling bugs. Our false-positive observation may stem from a similar class of issue.

Citation 4 — BAML does not use constrained decoding

  • File: engine/baml-runtime/src/internal/llm_client/primitive/google/googleai_client.rs
  • Change Summary: Assertions evaluate post-hoc, not during generation
  • Verbatim Quote: "Parsing the LLM's free-form output... enables you to retain that output quality"
  • Source URL: https://boundaryml.com/blog/structured-outputs-create-false-confidence
  • Justification: Confirms assertions are a post-parsing step. False positives must originate in the parsing/coercion pipeline, not in the LLM generation.

Citation 5 — Assert improvement commit history

  • File: engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs
  • Change Summary: The // IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG. comment suggests deliberate design choice
  • Verbatim Quote: CHANGELOG entry: "improve error message for asserts and checks (improve error message for asserts and checks #1975)" by Greg Hale (commit 070ad26)
  • Source URL: 070ad26
  • Justification: Shows the assertion error reporting was actively worked on but the causes-discarding behavior was preserved. Understanding the intent behind this decision is necessary before changing it.

Environment:

  • baml-py: 0.220.0
  • Python: 3.11.10
  • OS: macOS Darwin 25.3.0 (arm64)
  • Provider: Google Gemini 3.1 Flash Lite Preview (via BAML google-ai provider)
  • Volume: ~10,000 LLM calls per batch, 0.13% assertion failure rate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions