fix(inference-logging-client): decode FP16 scalars as bfloat16 and ui…#377
Open
the-tushar-meesho wants to merge 10 commits into
Open
fix(inference-logging-client): decode FP16 scalars as bfloat16 and ui…#377the-tushar-meesho wants to merge 10 commits into
the-tushar-meesho wants to merge 10 commits into
Conversation
…nt8 vectors as base64
The Go encoder writes FP16 scalars as the top 16 bits of a float32
(bfloat16: uint16(Float32bits>>16)), not IEEE-754 half precision, so the
client decoded them wrong (e.g. 1.5 -> 1.9375). Add decode_bfloat16 and use it
for the scalar FP16 path; binary FP16 vector elements remain true IEEE-754 and
keep using decode_ieee754_fp16.
Go's json.Marshal([]uint8) emits a base64 JSON string ("AQI="), not a JSON
array, so uint8 vectors decoded to raw quoted-string bytes. Decode the base64
string form, falling back to raw bytes for byte-sourced uint8 vectors.
Verified round-trip parity against Go ConvertStringToType output across all 23
scalar/vector types (0 failures), with no regression on the binary vector path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Comment |
Add tests/test_go_parity.py pinning the decoder as an exact inverse of the Go encoder (ConvertStringToType / go-core datatypeconverter) using real Go byte fixtures for all 23 scalar/vector types, plus FP16-scalar-as-bfloat16 and the binary byte-column path (IEEE-754 FP16 elements, raw uint8) to guard against regressions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oat precision * FP8 E4M3 / E5M2: port go-core float8.FP8E4M3ToFP32Value / FP8E5M2ToFP32Value bit-for-bit (scalars and vector elements) instead of returning the raw byte, so FP8 values match the feature-store encoding. * format_float: stop rounding decoded floats to 6 decimals (which silently changed the value, e.g. FP16 0.1 -> 0.099976). Return the exact float32/ float64 value to match Go's BytesToString value precision; display-level rounding remains the job of format_dataframe_floats. * tests: add FP8 scalar/vector fixtures and a full-precision float assertion. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r port Add go_datatypeconverter.py: a byte-for-byte Python port of go-core typeconverter.BytesToString, so byte-column / feature-store values decode to exactly the same string Go produces, for every type: * ints/uints/bool: decimal/bool like fmt.Sprint * fp32/fp64: Go shortest-%g formatting incl. the %f<->%e cutover (exponential when exp < -4 or exp >= 6, e.g. 1000000 -> "1e+06") * fp16: IEEE-754 half -> float32 -> Go float string * fp8 E4M3 / E5M2: ported bit-for-bit from go-core float8 * vectors: comma-joined (no brackets), binary packed elements Validated byte-for-byte against go-core BytesToString output for all 23+ types plus a float-formatting sweep (0 failures). Tests in tests/test_go_datatypeconverter.py carry the Go-captured fixtures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…strings) Wire the go-core datatypeconverter port into the proto decode path via an opt-in go_string flag (default False preserves the existing numeric output): * decoder.decode_feature_to_go_string: decode a feature's bytes to the exact string go-core BytesToString emits. Byte-path/binary values go through the go_datatypeconverter port; string-path values (ConvertStringToType JSON vectors, base64 uint8) are detected by leading byte and reformatted into the same Go-canonical comma string. * decode_proto_features / decode_proto_format / decode_mplog take go_string; decode_mplog raises NotImplementedError for non-proto formats with go_string. * tests: full framed-proto-row round trip (binary + JSON/base64 string-path) asserting Go-identical output. FP16 scalars remain the one ambiguity (2 bytes are bf16 or IEEE-754 half with no marker); go_string treats them as canonical IEEE-754. Exact both-path FP16 parity needs the model-proxy encoder to emit canonical IEEE-754. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… fp16 types Add COVERAGE_FIXTURES: every vector element type (bool, int8/16/64/32, uint8/16/32/64, fp16, fp32, fp64, fp8 e4m3/e5m2) with negatives/edges, plus FS-style FP16 scalars (incl IEEE-754 0.1 -> "0.099975586" and the 65000 clamp -> "64992"). All captured from go-core BytesToString on byte-path/binary bytes -- the format the inference log actually stores -- and asserted against the go_datatypeconverter port. Confirms every scalar and vector case decodes byte-for-byte identical to go-core. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dation Fuzzing the go_datatypeconverter port against go-core BytesToString over 423,563 inputs (ALL 65,536 FP16 values, ALL 256 FP8 E4M3/E5M2, ALL int8/uint8, + 380k random FP32/FP64/int/uint scalars and vectors of every element type) surfaced one divergence: negative zero. Go's strconv prints negative zero as "-0" (sign bit preserved); the port returned "0". Fixed go_format_float to emit "-0" for negative zero. After the fix: 423,563 / 423,563 byte-for-byte identical to go-core, covering NaN, +/-Inf, subnormals, scientific-notation boundaries, negative zero, and every vector element type. Add test_float_edge_cases_match_go pinning the trickiest fuzz-derived edges (NaN, +/-Inf, -0, subnormals, scientific) so they can't regress. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ult, all formats Make byte-for-byte go-core parity the default across every format and code path: * decode_feature_value(go_string=True default) -> delegates to the go_datatypeconverter port, so Arrow and Parquet decode paths (and the Spark DataFrame batch decoders that call them) emit exactly go-core BytesToString output. go_string=False restores the legacy typed (int/float/list) output. * decode_proto_features / decode_proto_format / decode_mplog default go_string=True; the proto path threads the flag explicitly. * Removed the proto-only NotImplementedError guard (arrow/parquet now supported). Net: decode_mplog() returns go-identical strings for proto, arrow, and parquet by default; pass go_string=False for legacy typed output (proto fully honors it). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
go-core byte-for-byte parity decode is now the default across proto/arrow/ parquet (decode_mplog go_string=True). Output changes from typed values to go-core BytesToString strings by default; pass go_string=False for legacy typed output. Minor bump to flag the behavior change.
Patch bump over 0.3.9 carrying the go-core byte-for-byte parity decode work.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…nt8 vectors as base64
The Go encoder writes FP16 scalars as the top 16 bits of a float32 (bfloat16: uint16(Float32bits>>16)), not IEEE-754 half precision, so the client decoded them wrong (e.g. 1.5 -> 1.9375). Add decode_bfloat16 and use it for the scalar FP16 path; binary FP16 vector elements remain true IEEE-754 and keep using decode_ieee754_fp16.
Go's json.Marshal([]uint8) emits a base64 JSON string ("AQI="), not a JSON array, so uint8 vectors decoded to raw quoted-string bytes. Decode the base64 string form, falling back to raw bytes for byte-sourced uint8 vectors.
Verified round-trip parity against Go ConvertStringToType output across all 23 scalar/vector types (0 failures), with no regression on the binary vector path.
🔁 Pull Request Template – BharatMLStack
Context:
Give a brief overview of the motivation behind this change. Include any relevant discussion links (Slack, documents, tickets, etc.) that help reviewers understand the background and the issue being addressed.
Describe your changes:
Mention the changes made in the codebase.
Testing:
Please describe how you tested the code. If manual tests were performed - please explain how. If automatic tests were added or existing ones cover the change - please explain how did you run them.
Monitoring:
Explain how this change will be tracked after deployment. Indicate whether current dashboards, alerts, and logs are enough, or if additional instrumentation is required.
Rollback plan
Explain rollback plan in case of issues.
Checklist before requesting a review
📂 Modules Affected
horizon(Real-time systems / networking)online-feature-store(Feature serving infra)trufflebox-ui(Admin panel / UI)infra(Docker, CI/CD, GCP/AWS setup)docs(Documentation updates)___________✅ Type of Change
___________📊 Benchmark / Metrics (if applicable)