You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use BAML @@assert constraints to enforce structural invariants on LLM-generated medical food safety data (array length parity, score ranges, character limits). When an assertion fails, BamlValidationError says "Assertions failed." but provides zero information about which assertion failed. The causes array in the internal ParsingError is always empty.
Business impact: We process ~10,000 tray images per batch run. At a 0.13% assertion failure rate, that's ~13 failures per run. Each failure requires manual investigation because the error message doesn't identify the failing assertion. With 8 assertions per class, the debugging surface is 8x larger than it needs to be. At production scale (multiple facilities, daily runs), this becomes untenable.
Secondary issue: In at least one case, the raw LLM output satisfies all defined assertions when manually evaluated, yet BAML still raises BamlValidationError. This suggests either: (a) assertions are evaluated against an intermediate representation that differs from the final coerced output, or (b) there's a false-positive path in the assertion evaluation pipeline.
Downstream systems affected: Any BAML user relying on @@assert for validation in high-volume pipelines. The error is non-actionable without code-level debugging.
2. Severity Classification
severity:medium — Feature partially broken. @@assertworks (it catches violations), but the error reporting is non-functional. Users cannot determine which assertion failed without manual re-evaluation of all assertions against the raw output. The potential false-positive path (second issue) elevates this from low to medium.
The function computes a causes Vec containing one ParsingError per failing assertion (with label and expression), then returns ParsingError { causes: vec![], ... } — discarding the diagnostics entirely.
ExposedError::ValidationError — no structured cause field
File:engine/baml-runtime/src/errors.rs
pubenumExposedError{/// Error in parsing post calling the LLMValidationError{prompt:String,raw_output:String,message:String,detailed_message:String,},// ...}
Only four string fields. No way to carry structured assertion failure data to the Python/TypeScript client.
Manual evaluation of the raw output against all 8 assertions: all pass. (assessment_score=7, bullets=3, verbal=89 chars, ticket items/confidence both length 7, scanned items/confidence both length 6, missing disc/confidence both length 1, unexpected disc/confidence both length 0, legibility=100.0.)
5. What Needs to Be Troubleshot (and Why)
Issue A: Discarded assertion failure details
The causes variable in validate_asserts() contains exactly the information users need — which assertion failed and what expression evaluated to false. But the return value uses causes: vec![]. This appears intentional (see the // IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG. comment), but the rationale isn't documented.
Question for the team: Is there a downstream consumer that pattern-matches on "Assertions failed." with empty causes? If so, could causes be added alongside the existing message without breaking that contract?
Issue B: Possible false-positive assertion failure
The raw output satisfies all 8 assertions when evaluated manually. Three hypotheses:
Field name mismatch during evaluation: The LLM returned "meal_confidence" instead of "meal_ticket_confidence". BAML's jsonish parser may rename fields during coercion, but if @@assert evaluates before the rename completes, this.meal_ticket_confidence could resolve to undefined or an empty array, causing ticket_pair_length to fail.
Jinja evaluation context differs from final output: The run_user_checks function evaluates constraints against &BamlValue. If the BamlValue tree differs from the final JSON serialization (e.g., intermediate coercion state), assertions could fail on data that appears valid in the raw_output string.
Non-deterministic evaluation: Unlikely given post-hoc parsing, but worth ruling out.
Validation needed
Add the per-assertion causes to the returned ParsingError and verify which assertion actually failed.
Check whether run_user_checks receives the pre-rename or post-rename field values.
6. Steps to Reproduce
Issue A (empty causes) — always reproducible
Environment: Python 3.11.10, baml-py 0.220.0, macOS Darwin 25.3.0
Define a BAML class with any @@assert constraint
Feed it LLM output that violates the assertion
Catch BamlValidationError
Observed:message contains "Assertions failed." with no indication of which assertion
Expected: Error should identify the failing assertion by label and expression
Issue B (false positive) — intermittent
Environment: Same as above, Google Gemini 3.1 Flash Lite Preview
BAML class with 8 @@assert constraints (see Section 4)
Process ~10,000 images through AnalyzeTray function
~0.13% of responses raise BamlValidationError despite raw_output satisfying all assertions
Same image succeeds on retry ~75% of the time (9 of 12 recovered after one retry)
One image failed 3 consecutive attempts despite valid-looking output each time
Observed:BamlValidationError with "Assertions failed." on structurally valid output
Expected: No error, or error identifying specifically which assertion failed
7. Proposed Direction
Fix A: Populate causes in validate_asserts return value
This is a one-line change. The causes variable already contains the right data.
Fix B: Surface causes in BamlValidationError
Add a failed_assertions field (or similar) to ExposedError::ValidationError and propagate to the Python/TypeScript exception. Structured data (assertion label, expression, evaluated value) would allow programmatic handling.
Fix C: Investigate false-positive path
Requires instrumentation in coerce_class.rs to log the BamlValue passed to run_user_checks alongside the raw output, for cases where assertions fail but the output appears valid.
Risk assessment
Fix A is low-risk — it's using data that's already computed and just not passed through.
Fix B requires API changes to the error class across Python/TypeScript/Ruby clients.
Change Summary:validate_asserts() should propagate per-assertion failure details instead of discarding them
Verbatim Quote: "When asserts fail, your BAML function will raise a BamlValidationError exception, same as when parsing fails." and "You can define custom names for each assertion, which will be included in the exception for that failure case."
Justification: The docs promise custom assertion names "will be included in the exception" — but the current implementation discards them. This is a doc-behavior mismatch.
Citation 2 — BamlValidationError has only four string fields
File:engine/baml-runtime/src/errors.rs
Change Summary:ExposedError::ValidationError needs a structured causes field
Verbatim Quote: "Regression relative to 0.82. Assert evaluation fails on some nullable fields: Error: content: Failed to evaluate assert: Could not unify Null with WithMetadata"
Justification: Establishes precedent that assertion evaluation can produce false positives due to type handling bugs. Our false-positive observation may stem from a similar class of issue.
Citation 4 — BAML does not use constrained decoding
Justification: Confirms assertions are a post-parsing step. False positives must originate in the parsing/coercion pipeline, not in the LLM generation.
Justification: Shows the assertion error reporting was actively worked on but the causes-discarding behavior was preserved. Understanding the intent behind this decision is necessary before changing it.
Environment:
baml-py: 0.220.0
Python: 3.11.10
OS: macOS Darwin 25.3.0 (arm64)
Provider: Google Gemini 3.1 Flash Lite Preview (via BAML google-ai provider)
Volume: ~10,000 LLM calls per batch, 0.13% assertion failure rate
1. Why This Is Important
We use BAML
@@assertconstraints to enforce structural invariants on LLM-generated medical food safety data (array length parity, score ranges, character limits). When an assertion fails,BamlValidationErrorsays"Assertions failed."but provides zero information about which assertion failed. Thecausesarray in the internalParsingErroris always empty.Business impact: We process ~10,000 tray images per batch run. At a 0.13% assertion failure rate, that's ~13 failures per run. Each failure requires manual investigation because the error message doesn't identify the failing assertion. With 8 assertions per class, the debugging surface is 8x larger than it needs to be. At production scale (multiple facilities, daily runs), this becomes untenable.
Secondary issue: In at least one case, the raw LLM output satisfies all defined assertions when manually evaluated, yet BAML still raises
BamlValidationError. This suggests either: (a) assertions are evaluated against an intermediate representation that differs from the final coerced output, or (b) there's a false-positive path in the assertion evaluation pipeline.Downstream systems affected: Any BAML user relying on
@@assertfor validation in high-volume pipelines. The error is non-actionable without code-level debugging.2. Severity Classification
severity:medium— Feature partially broken.@@assertworks (it catches violations), but the error reporting is non-functional. Users cannot determine which assertion failed without manual re-evaluation of all assertions against the raw output. The potential false-positive path (second issue) elevates this from low to medium.3. Files Impacted
engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rsvalidate_asserts()builds per-assertion failure details then discards themengine/baml-runtime/src/errors.rsExposedError::ValidationErrorhas no field for structured assertion failure dataengine/language_client_python/src/errors.rsraise_baml_validation_error()only passes four strings — no structured causesengine/baml-lib/jsonish/src/deserializer/coercer/ir_ref/coerce_class.rs@@assertevaluation — may explain false-positive path4. Relevant Code (Verbatim)
validate_asserts— failure details are built then discardedFile:
engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rsThe function computes a
causesVec containing oneParsingErrorper failing assertion (with label and expression), then returnsParsingError { causes: vec![], ... }— discarding the diagnostics entirely.ExposedError::ValidationError— no structured cause fieldFile:
engine/baml-runtime/src/errors.rsOnly four string fields. No way to carry structured assertion failure data to the Python/TypeScript client.
Python bridge — four strings only
File:
engine/language_client_python/src/errors.rsOur BAML schema — 8 assertions, all pass on the "failing" output
File: (our project)
baml_src/types/tray_analysis.bamlManual evaluation of the raw output against all 8 assertions: all pass. (assessment_score=7, bullets=3, verbal=89 chars, ticket items/confidence both length 7, scanned items/confidence both length 6, missing disc/confidence both length 1, unexpected disc/confidence both length 0, legibility=100.0.)
5. What Needs to Be Troubleshot (and Why)
Issue A: Discarded assertion failure details
The
causesvariable invalidate_asserts()contains exactly the information users need — which assertion failed and what expression evaluated to false. But the return value usescauses: vec![]. This appears intentional (see the// IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG.comment), but the rationale isn't documented.Question for the team: Is there a downstream consumer that pattern-matches on
"Assertions failed."with empty causes? If so, could causes be added alongside the existing message without breaking that contract?Issue B: Possible false-positive assertion failure
The raw output satisfies all 8 assertions when evaluated manually. Three hypotheses:
Field name mismatch during evaluation: The LLM returned
"meal_confidence"instead of"meal_ticket_confidence". BAML's jsonish parser may rename fields during coercion, but if@@assertevaluates before the rename completes,this.meal_ticket_confidencecould resolve toundefinedor an empty array, causingticket_pair_lengthto fail.Jinja evaluation context differs from final output: The
run_user_checksfunction evaluates constraints against&BamlValue. If theBamlValuetree differs from the final JSON serialization (e.g., intermediate coercion state), assertions could fail on data that appears valid in theraw_outputstring.Non-deterministic evaluation: Unlikely given post-hoc parsing, but worth ruling out.
Validation needed
causesto the returnedParsingErrorand verify which assertion actually failed.run_user_checksreceives the pre-rename or post-rename field values.6. Steps to Reproduce
Issue A (empty causes) — always reproducible
@@assertconstraintBamlValidationErrormessagecontains"Assertions failed."with no indication of which assertionIssue B (false positive) — intermittent
@@assertconstraints (see Section 4)AnalyzeTrayfunctionBamlValidationErrordespiteraw_outputsatisfying all assertionsBamlValidationErrorwith"Assertions failed."on structurally valid output7. Proposed Direction
Fix A: Populate
causesinvalidate_assertsreturn valueThis is a one-line change. The
causesvariable already contains the right data.Fix B: Surface causes in
BamlValidationErrorAdd a
failed_assertionsfield (or similar) toExposedError::ValidationErrorand propagate to the Python/TypeScript exception. Structured data (assertion label, expression, evaluated value) would allow programmatic handling.Fix C: Investigate false-positive path
Requires instrumentation in
coerce_class.rsto log theBamlValuepassed torun_user_checksalongside the raw output, for cases where assertions fail but the output appears valid.Risk assessment
8. Documentation & Citations
Citation 1 —
@@assertfailure behaviorengine/baml-lib/jsonish/src/deserializer/coercer/field_type.rsvalidate_asserts()should propagate per-assertion failure details instead of discarding themBamlValidationErrorexception, same as when parsing fails." and "You can define custom names for each assertion, which will be included in the exception for that failure case."Citation 2 —
BamlValidationErrorhas only four string fieldsengine/baml-runtime/src/errors.rsExposedError::ValidationErrorneeds a structured causes fieldmessage,prompt,raw_output,detailed_messageCitation 3 — Known assertion evaluation bug with nullable types
engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rsError: content: Failed to evaluate assert: Could not unify Null with WithMetadata"Citation 4 — BAML does not use constrained decoding
engine/baml-runtime/src/internal/llm_client/primitive/google/googleai_client.rsCitation 5 — Assert improvement commit history
engine/baml-lib/jsonish/src/deserializer/coercer/field_type.rs// IMPORTANT: DO NOT CHANGE THIS MESSAGE. TALK TO GREG.comment suggests deliberate design choice070ad26)Environment: