Skip to content

Conversation

@alexandru-uta
Copy link
Collaborator

@alexandru-uta alexandru-uta commented Sep 25, 2025

This PR adds (tagged) blob deduplication support. The main issue it solves is that all external calls to a motoko canister go through candid deserialization and blobs passed as arguments end up as fresh blobs on the motoko heap. Calling multiple times with the same blob as argument creates multiple copies of the same blob.

To achieve deduplication, this PR does the following:

  • in internals.mo, it creates a fixed-size hash-table which solves collisions via chaining.
  • sets up a thin RTS interface to set/get the hash-table allocated in internals.mo to be tracked by the RTS layer such that the table is not garbage collected and it survives upgrades.
  • to achieve deduplication, the hash table stores weak references pointing to the actual objects; once objects are garbage collected, the weak references will point to null.
  • a thin client interface (in prim.mo) to walk the hash table and check which deduplicated blobs are alive/dead and prune the dead ones if neeed.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 25, 2025

Comparing from ae9b575 to 832b0e5:
In terms of gas, 5 tests regressed and the mean change is +0.1%.
In terms of size, 5 tests regressed and the mean change is +1.9%.

/// Setter method for the dedup table.
pub(crate) unsafe fn set_dedup_table_ptr(dedup_table: Value) {
let metadata = PersistentMetadata::get();
(*metadata).dedup_table = dedup_table;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe here a write barrier would be needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thank you!

Copy link
Contributor

@luc-blaeser luc-blaeser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. Just the one comment on barrier

@alexandru-uta alexandru-uta changed the title experiment: First approach at deduping blobs feat: Deduplicate tagged blobs Oct 7, 2025
@alexandru-uta alexandru-uta marked this pull request as ready for review October 7, 2025 16:03
@alexandru-uta alexandru-uta requested a review from a team as a code owner October 7, 2025 16:03
@alexandru-uta alexandru-uta merged commit eea6ebc into master Oct 8, 2025
20 checks passed
@alexandru-uta alexandru-uta deleted the alexuta/dedup-objs branch October 8, 2025 16:12
luc-blaeser pushed a commit that referenced this pull request Oct 14, 2025
This PR adds (tagged) blob deduplication support. The main issue it
solves is that all external calls to a motoko canister go through candid
deserialization and blobs passed as arguments end up as fresh blobs on
the motoko heap. Calling multiple times with the same blob as argument
creates multiple copies of the same blob.

To achieve deduplication, this PR does the following:
* in `internals.mo`, it creates a fixed-size hash-table which solves
collisions via chaining.
* sets up a thin RTS interface to set/get the hash-table allocated in
`internals.mo` to be tracked by the RTS layer such that the table is not
garbage collected and it survives upgrades.
* to achieve deduplication, the hash table stores weak references
pointing to the actual objects; once objects are garbage collected, the
weak references will point to null.
* a thin client interface (in `prim.mo`) to walk the hash table and
check which deduplicated blobs are alive/dead and prune the dead ones if
neeed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants