Skip to content

refactor: migrate embedding storage to BF16 format#1246

Merged
zhenghaoz merged 3 commits intogorse-io:masterfrom
zhangzhenghao:bf16
Apr 21, 2026
Merged

refactor: migrate embedding storage to BF16 format#1246
zhenghaoz merged 3 commits intogorse-io:masterfrom
zhangzhenghao:bf16

Conversation

@zhangzhenghao
Copy link
Copy Markdown
Contributor

Summary

This PR migrates embedding storage from float32 to BF16 (bfloat16) format for memory efficiency.

Changes

  • Item/User label embeddings: Convert float32/float64 slices to BF16 (uint16) storage
  • CTR model: Update Embedding.Value type from []float32 to []uint16
  • Dataset processing: Add support for direct []float32 and []float64 label input
  • Tests: Update test cases to use BF16 format for embedded labels

Related PRs

Benefits

  • ~50% memory reduction for embedding storage
  • Maintains sufficient precision for ML workloads
  • Consistent BF16 usage across the codebase

- Convert float32/float64 slices in labels to BF16 (uint16) format
- Update Embedding.Value type from []float32 to []uint16 in CTR model
- Add support for []float32 and []float64 direct label input
- Update tests to use BF16 format for embedded labels
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 19, 2026

Codecov Report

❌ Patch coverage is 69.86301% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.28%. Comparing base (2e73b7c) to head (a5ddae4).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
common/bfloats/bfloats.go 69.56% 14 Missing ⚠️
model/ctr/data.go 60.00% 4 Missing ⚠️
logics/item_to_item.go 76.92% 2 Missing and 1 partial ⚠️
master/tasks.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1246      +/-   ##
==========================================
+ Coverage   73.01%   73.28%   +0.27%     
==========================================
  Files          91       91              
  Lines       16694    16704      +10     
==========================================
+ Hits        12189    12242      +53     
+ Misses       3262     3250      -12     
+ Partials     1243     1212      -31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates embedding storage across CTR and item-to-item logic from []float32/[]float64 to BF16 bit-pattern storage ([]uint16) to reduce memory usage and align embedding handling across the codebase.

Changes:

  • Update CTR Embedding.Value to []uint16 and propagate through AFM batch prediction code paths.
  • Convert dataset label embedding processing to store embeddings as BF16 ([]uint16) and add a BF16 FromAny helper for mixed numeric inputs.
  • Update unit tests to assert against BF16-encoded embeddings.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
model/ctr/model_test.go Updates CTR model tests to pass BF16 embeddings ([]uint16).
model/ctr/fm_xla.go Stops converting embeddings to BF16 inside batch prediction (now already BF16).
model/ctr/fm.go Same as above for the non-XLA AFM path; expects BF16 embeddings.
model/ctr/data_test.go Updates embedding conversion tests to assert BF16 output.
model/ctr/data.go Changes Embedding.Value to []uint16 and converts input embedding representations to BF16.
master/tasks.go Stores item embeddings directly as BF16 in CTR dataset construction.
logics/item_to_item_test.go Removes toFloat32Slice tests (conversion responsibility moved).
logics/item_to_item.go Switches embedding item-to-item ANN distance to BF16 Euclidean and uses BF16 vectors.
dataset/dataset_test.go Updates dataset item label expectations to BF16-encoded embeddings; adds mixed-type input case.
dataset/dataset.go Converts numeric []any label slices to BF16 during label processing.
common/bfloats/bfloats_test.go Adds tests for bfloats.FromAny.
common/bfloats/bfloats.go Adds FromAny and numeric conversion helper for BF16 ingestion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread model/ctr/data.go
Comment thread common/bfloats/bfloats.go
Comment thread model/ctr/data_test.go
@zhenghaoz zhenghaoz merged commit 383b815 into gorse-io:master Apr 21, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants