Swap IntNBit TBE Kernel with SSD Embedding DB TBE Kernel for SSD Infernece Enablement (#3134)

faran928 · facebook-github-bot · commit a6d6f622c633 · 2025-06-25T14:29:21.000-07:00
Summary: Pull Request resolved: #3134 For SSD inference, we have added EmbeddingDB as a custom in house storage not exposed to OSS. We leverage TGIF stack to rewrite IntNBit TBE Kernel with SSD EmbeddingDB TBE kernel as SSD TBE embedding kernel can't be exposed within TorchRec code base. Additionally, for SSD we only provide in di_sharding_pass and SSD can be enabled without having additional DI shards. In that case, for the tables that assigned to CPU host we can just do tw sharding of those tables. Added the TW sharding logic accordingly. The diff also includes an option to manually enable TW sharding using DI / Universal Sharding Pass logic that can be used to override automated tw universal sharding behavior in case that's less efficient (as that's still being improved). Reviewed By: gyllstromk Differential Revision: D76953960 fbshipit-source-id: b2cef7eb118dd5f94242b85c2a931a4d365d58d4
diff --git a/torchrec/distributed/quant_embeddingbag.py b/torchrec/distributed/quant_embeddingbag.py
@@ -224,6 +224,7 @@ def __init__(
         self._is_weighted: bool = module.is_weighted()
         self._lookups: List[nn.Module] = []
         self._create_lookups(fused_params, device)
+        self._fused_params = fused_params
 
         # Ensure output dist is set for post processing from an inference runtime (ie. setting device from runtime).
         self._output_dists: torch.nn.ModuleList = torch.nn.ModuleList()