-
Notifications
You must be signed in to change notification settings - Fork 54
feat: incremental-hasher #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,4 +1,5 @@ | ||||||||||||||||||||||||||
| // # Multihash | ||||||||||||||||||||||||||
| import type { MulticodecCode } from '../block/interface.js' | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Represents a multihash digest which carries information about the | ||||||||||||||||||||||||||
|
|
@@ -9,7 +10,7 @@ | |||||||||||||||||||||||||
| // a bunch of places that parse it to extract (code, digest, size). By creating | ||||||||||||||||||||||||||
| // this first class representation we avoid reparsing and things generally fit | ||||||||||||||||||||||||||
| // really nicely. | ||||||||||||||||||||||||||
| export interface MultihashDigest<Code extends number = number> { | ||||||||||||||||||||||||||
| export interface MultihashDigest<Code extends MulticodecCode = MulticodecCode, Size extends number = number> { | ||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Code of the multihash | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
|
|
@@ -23,7 +24,7 @@ export interface MultihashDigest<Code extends number = number> { | |||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * byte length of the `this.digest` | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| size: number | ||||||||||||||||||||||||||
| size: Size | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Binary representation of this multihash digest. | ||||||||||||||||||||||||||
|
|
@@ -35,7 +36,7 @@ export interface MultihashDigest<Code extends number = number> { | |||||||||||||||||||||||||
| * Hasher represents a hashing algorithm implementation that produces as | ||||||||||||||||||||||||||
| * `MultihashDigest`. | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| export interface MultihashHasher<Code extends number = number> { | ||||||||||||||||||||||||||
| export interface MultihashHasher<Code extends MulticodecCode = MulticodecCode> { | ||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Takes binary `input` and returns it (multi) hash digest. Return value is | ||||||||||||||||||||||||||
| * either promise of a digest or a digest. This way general use can `await` | ||||||||||||||||||||||||||
|
|
@@ -67,6 +68,67 @@ export interface MultihashHasher<Code extends number = number> { | |||||||||||||||||||||||||
| * `SyncMultihashHasher` is useful in certain APIs where async hashing would be | ||||||||||||||||||||||||||
| * impractical e.g. implementation of Hash Array Mapped Trie (HAMT). | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| export interface SyncMultihashHasher<Code extends number = number> extends MultihashHasher<Code> { | ||||||||||||||||||||||||||
| export interface SyncMultihashHasher<Code extends MulticodecCode = MulticodecCode> extends MultihashHasher<Code> { | ||||||||||||||||||||||||||
| digest: (input: Uint8Array) => MultihashDigest<Code> | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Incremental variant of the `MultihashHasher` that can be used to compute | ||||||||||||||||||||||||||
| * digest of the payloads that would be impractical or impossible to load all | ||||||||||||||||||||||||||
| * into a memory. | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| export interface IncrementalMultihashHasher< | ||||||||||||||||||||||||||
| Code extends MulticodecCode, | ||||||||||||||||||||||||||
| Size extends number, | ||||||||||||||||||||||||||
| Digest = MultihashDigest<Code, Size> | ||||||||||||||||||||||||||
| > { | ||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Size of the digest this hasher produces. | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| size: Size | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Code of the multihash | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| code: Code | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Name of the multihash | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| name: string | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Number of bytes that were consumed. | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| count(): bigint | ||||||||||||||||||||||||||
|
Comment on lines
+100
to
+103
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Let's just drop this method, we can revisit if we find it really necessary. |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Returns multihash digest of the bytes written so far. Should not have | ||||||||||||||||||||||||||
| * side-effects, meaning you should be able to write some more bytes and | ||||||||||||||||||||||||||
| * call `digest` again to get the digest for all the bytes written from | ||||||||||||||||||||||||||
| * creation (or from reset) | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| digest(): Digest | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Computes the digest of the given input and writes it into the provided | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| * Computes the digest of the given input and writes it into the provided | |
| * Computes the digest of the bytes written so far and writes it into the provided |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return output? Not sure how useful it is to chain this method.
| digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this | |
| digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): Uint8Array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a better convention for method name when receiving a parameter to mutate. digestInto is a bit awkwardly named (IMO!).
digestBYOB? 🙄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asMultihash - do we need? The point of this library is multiformats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went into this in the linked issue. In practice I have encountered many instances where I do need to leave out the prefix, I could go and use another library in those instances, but seems like we could just expose this. That sayid I agree that this method is kind of meh and we could do better.
@vasco-santos suggested having a whole another method, personally I'm wondering if perhaps we should have methods just to get digest without a prefix as this is low level API anyway ?
| digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this | |
| // multihash prefix for it | |
| header: Uint8Array | |
| // only writes the digest without a prefix | |
| digestInto(output: Uint8Array, offset?: number): this |
Or alternatively something like this
| digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this | |
| encodeMultihashInto(target: Uint8Array, offset?: number): this | |
| encodeMultihashHeaderInto(target: Uint8Array, offset?: number): this | |
| encodeDigestInto(target: Uint8Array, offset?: number): thsi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
digestIntois a bit awkwardly named (IMO!).
I'm happy to call this something, else I was trying to align with varint.encodeInto, which as it turns out was called varint.encodeTo 😅
digestBYOB? 🙄
Works for me although I had to google to figure out what BYOB stand for.
Alternatively we could just have read(bytes: Uint8Array, offfset?: number): this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return output? Not sure how useful it is to chain this method.
I personally find mutations that return values misleading, that said I'm amendable to the idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to google to figure out what BYOB stand for.
BYOB was aiming for familiarity with https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader etc. but I guess that was a miss 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically in streaming hashers this is called update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with calling it update although I do find that name confusing personally as I think of update as overwrite as opposed to append.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but streaming hashers aren't appending to a buffer they are updating their internal state with the new data you pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if someone is hashing >9PiB of data in JS then 👏👏👏.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ... is this overkill?