fix(inference-logging-client): decode FP16 scalars as bfloat16 and ui… by the-tushar-meesho · Pull Request #377 · Meesho/BharatMLStack

the-tushar-meesho · 2026-06-03T18:33:23Z

…nt8 vectors as base64

The Go encoder writes FP16 scalars as the top 16 bits of a float32 (bfloat16: uint16(Float32bits>>16)), not IEEE-754 half precision, so the client decoded them wrong (e.g. 1.5 -> 1.9375). Add decode_bfloat16 and use it for the scalar FP16 path; binary FP16 vector elements remain true IEEE-754 and keep using decode_ieee754_fp16.

Go's json.Marshal([]uint8) emits a base64 JSON string ("AQI="), not a JSON array, so uint8 vectors decoded to raw quoted-string bytes. Decode the base64 string form, falling back to raw bytes for byte-sourced uint8 vectors.

Verified round-trip parity against Go ConvertStringToType output across all 23 scalar/vector types (0 failures), with no regression on the binary vector path.

🔁 Pull Request Template – BharatMLStack

Please fill out the following sections to help us review your changes efficiently.

Context:

Give a brief overview of the motivation behind this change. Include any relevant discussion links (Slack, documents, tickets, etc.) that help reviewers understand the background and the issue being addressed.

Describe your changes:

Mention the changes made in the codebase.

Testing:

Please describe how you tested the code. If manual tests were performed - please explain how. If automatic tests were added or existing ones cover the change - please explain how did you run them.

Monitoring:

Explain how this change will be tracked after deployment. Indicate whether current dashboards, alerts, and logs are enough, or if additional instrumentation is required.

Rollback plan

Explain rollback plan in case of issues.

Checklist before requesting a review

I have reviewed my own changes?
Relevant or critical functionality is covered by tests?
Monitoring needs have been evaluated?
Any necessary documentation updates have been considered?

📂 Modules Affected

horizon (Real-time systems / networking)
online-feature-store (Feature serving infra)
trufflebox-ui (Admin panel / UI)
infra (Docker, CI/CD, GCP/AWS setup)
docs (Documentation updates)
Other: ___________

✅ Type of Change

📊 Benchmark / Metrics (if applicable)

…nt8 vectors as base64 The Go encoder writes FP16 scalars as the top 16 bits of a float32 (bfloat16: uint16(Float32bits>>16)), not IEEE-754 half precision, so the client decoded them wrong (e.g. 1.5 -> 1.9375). Add decode_bfloat16 and use it for the scalar FP16 path; binary FP16 vector elements remain true IEEE-754 and keep using decode_ieee754_fp16. Go's json.Marshal([]uint8) emits a base64 JSON string ("AQI="), not a JSON array, so uint8 vectors decoded to raw quoted-string bytes. Decode the base64 string form, falling back to raw bytes for byte-sourced uint8 vectors. Verified round-trip parity against Go ConvertStringToType output across all 23 scalar/vector types (0 failures), with no regression on the binary vector path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-03T18:33:32Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)

develop

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c791a10f-31b7-4ffc-8dfb-f240b66b9a9d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add tests/test_go_parity.py pinning the decoder as an exact inverse of the Go encoder (ConvertStringToType / go-core datatypeconverter) using real Go byte fixtures for all 23 scalar/vector types, plus FP16-scalar-as-bfloat16 and the binary byte-column path (IEEE-754 FP16 elements, raw uint8) to guard against regressions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…oat precision * FP8 E4M3 / E5M2: port go-core float8.FP8E4M3ToFP32Value / FP8E5M2ToFP32Value bit-for-bit (scalars and vector elements) instead of returning the raw byte, so FP8 values match the feature-store encoding. * format_float: stop rounding decoded floats to 6 decimals (which silently changed the value, e.g. FP16 0.1 -> 0.099976). Return the exact float32/ float64 value to match Go's BytesToString value precision; display-level rounding remains the job of format_dataframe_floats. * tests: add FP8 scalar/vector fixtures and a full-precision float assertion. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…r port Add go_datatypeconverter.py: a byte-for-byte Python port of go-core typeconverter.BytesToString, so byte-column / feature-store values decode to exactly the same string Go produces, for every type: * ints/uints/bool: decimal/bool like fmt.Sprint * fp32/fp64: Go shortest-%g formatting incl. the %f<->%e cutover (exponential when exp < -4 or exp >= 6, e.g. 1000000 -> "1e+06") * fp16: IEEE-754 half -> float32 -> Go float string * fp8 E4M3 / E5M2: ported bit-for-bit from go-core float8 * vectors: comma-joined (no brackets), binary packed elements Validated byte-for-byte against go-core BytesToString output for all 23+ types plus a float-formatting sweep (0 failures). Tests in tests/test_go_datatypeconverter.py carry the Go-captured fixtures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…strings) Wire the go-core datatypeconverter port into the proto decode path via an opt-in go_string flag (default False preserves the existing numeric output): * decoder.decode_feature_to_go_string: decode a feature's bytes to the exact string go-core BytesToString emits. Byte-path/binary values go through the go_datatypeconverter port; string-path values (ConvertStringToType JSON vectors, base64 uint8) are detected by leading byte and reformatted into the same Go-canonical comma string. * decode_proto_features / decode_proto_format / decode_mplog take go_string; decode_mplog raises NotImplementedError for non-proto formats with go_string. * tests: full framed-proto-row round trip (binary + JSON/base64 string-path) asserting Go-identical output. FP16 scalars remain the one ambiguity (2 bytes are bf16 or IEEE-754 half with no marker); go_string treats them as canonical IEEE-754. Exact both-path FP16 parity needs the model-proxy encoder to emit canonical IEEE-754. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… fp16 types Add COVERAGE_FIXTURES: every vector element type (bool, int8/16/64/32, uint8/16/32/64, fp16, fp32, fp64, fp8 e4m3/e5m2) with negatives/edges, plus FS-style FP16 scalars (incl IEEE-754 0.1 -> "0.099975586" and the 65000 clamp -> "64992"). All captured from go-core BytesToString on byte-path/binary bytes -- the format the inference log actually stores -- and asserted against the go_datatypeconverter port. Confirms every scalar and vector case decodes byte-for-byte identical to go-core. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…dation Fuzzing the go_datatypeconverter port against go-core BytesToString over 423,563 inputs (ALL 65,536 FP16 values, ALL 256 FP8 E4M3/E5M2, ALL int8/uint8, + 380k random FP32/FP64/int/uint scalars and vectors of every element type) surfaced one divergence: negative zero. Go's strconv prints negative zero as "-0" (sign bit preserved); the port returned "0". Fixed go_format_float to emit "-0" for negative zero. After the fix: 423,563 / 423,563 byte-for-byte identical to go-core, covering NaN, +/-Inf, subnormals, scientific-notation boundaries, negative zero, and every vector element type. Add test_float_edge_cases_match_go pinning the trickiest fuzz-derived edges (NaN, +/-Inf, -0, subnormals, scientific) so they can't regress. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ult, all formats Make byte-for-byte go-core parity the default across every format and code path: * decode_feature_value(go_string=True default) -> delegates to the go_datatypeconverter port, so Arrow and Parquet decode paths (and the Spark DataFrame batch decoders that call them) emit exactly go-core BytesToString output. go_string=False restores the legacy typed (int/float/list) output. * decode_proto_features / decode_proto_format / decode_mplog default go_string=True; the proto path threads the flag explicitly. * Removed the proto-only NotImplementedError guard (arrow/parquet now supported). Net: decode_mplog() returns go-identical strings for proto, arrow, and parquet by default; pass go_string=False for legacy typed output (proto fully honors it). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

go-core byte-for-byte parity decode is now the default across proto/arrow/ parquet (decode_mplog go_string=True). Output changes from typed values to go-core BytesToString strings by default; pass go_string=False for legacy typed output. Minor bump to flag the behavior change.

Patch bump over 0.3.9 carrying the go-core byte-for-byte parity decode work.

the-tushar-meesho and others added 9 commits June 4, 2026 00:17

chore(inference-logging-client): bump version to 0.3.10

5141e98

Patch bump over 0.3.9 carrying the go-core byte-for-byte parity decode work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference-logging-client): decode FP16 scalars as bfloat16 and ui…#377

fix(inference-logging-client): decode FP16 scalars as bfloat16 and ui…#377
the-tushar-meesho wants to merge 10 commits into
fix/inference-logging-client-arrow-binary-typefrom
fix/ilc-fp16-uint8-parity

the-tushar-meesho commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

the-tushar-meesho commented Jun 3, 2026

🔁 Pull Request Template – BharatMLStack

Context:

Describe your changes:

Testing:

Monitoring:

Rollback plan

Checklist before requesting a review

📂 Modules Affected

✅ Type of Change

📊 Benchmark / Metrics (if applicable)

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading