Skip to content

Conversation

@danicuki
Copy link
Contributor

@danicuki danicuki commented Dec 4, 2025

No description provided.

Copy link
Collaborator

@zdave-parity zdave-parity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look correct to me, just a few minor comments.

\newcommand{\spl}[1]{\text{split}_{#1}}

The foundation of the data-availability and distribution system of \Jam is a systematic Reed-Solomon erasure coding function in \textsc{gf}($2^{16}$) of rate 342:1023, the same transform as done by the algorithm of \cite{lin2014novel}. We use a little-endian $\blob[2]$ form of the 16-bit \textsc{gf} points with a functional equivalence given by $\fnencode[2]$. From this we may assume the encoding function $\fnerasurecode: \sequence[342]{\blob[2]} \to \sequence[1023]{\blob[2]}$ and the recovery function $\fnecrecover: \protoset{\tuple{\blob[2], \Nmax{1023}}}_{342} \to \sequence[342]{\blob[2]}$. Encoding is done by extrapolating a data blob of size 684 octets (provided in $\fnerasurecode$ here as 342 octet pairs) into 1,023 octet pairs. Recovery is done by collecting together any distinct 342 octet pairs, together with their indices, and transforming this into the original sequence of 342 octet pairs.
The foundation of the data-availability and distribution system of \Jam is a systematic Reed-Solomon erasure coding function in \textsc{gf}($2^{16}$) of rate $\nicefrac{\Cecpiecesize}{2}$:$\Cvalcount$, the same transform as done by the algorithm of \cite{lin2014novel}. We use a little-endian $\blob[2]$ form of the 16-bit \textsc{gf} points with a functional equivalence given by $\fnencode[2]$. From this we may assume the encoding function $\fnerasurecode: \sequence[\nicefrac{\Cecpiecesize}{2}]{\blob[2]} \to \sequence[\Cvalcount]{\blob[2]}$ and the recovery function $\fnecrecover: \protoset{\tuple{\blob[2], \Nmax{\Cvalcount}}}_{\nicefrac{\Cecpiecesize}{2}} \to \sequence[\nicefrac{\Cecpiecesize}{2}]{\blob[2]}$. Encoding is done by extrapolating a data blob of size $\Cecpiecesize$ octets (provided in $\fnerasurecode$ here as $\nicefrac{\Cecpiecesize}{2}$ octet pairs) into $\Cvalcount$ octet pairs. Recovery is done by collecting together any distinct $\nicefrac{\Cecpiecesize}{2}$ octet pairs, together with their indices, and transforming this into the original sequence of $\nicefrac{\Cecpiecesize}{2}$ octet pairs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the /2s make this a bit unpleasant to read. Maybe sensible to introduce another constant for the number of original shards, or change the meaning of W_E to this and use 2W_E for piece size? This is a style question that is probably best answered by Gav though.

Copy link
Contributor Author

@danicuki danicuki Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdave-parity I totally agree with you. @gavofyork what is your suggestion?

Once done, then imported segments must be reconstructed. This process may in fact be lazy as the Refine function makes no usage of the data until the \emph{fetch} host-call is made. Fetching generally implies that, for each imported segment, erasure-coded chunks are retrieved from enough unique validators (342, including the guarantor) and is described in more depth in appendix \ref{sec:erasurecoding}. (Since we specify systematic erasure-coding, its reconstruction is trivial in the case that the correct 342 validators are responsive.) Chunks must be fetched for both the data itself and for justification metadata which allows us to ensure that the data is correct.
Once done, then imported segments must be reconstructed. This process may in fact be lazy as the Refine function makes no usage of the data until the \emph{fetch} host-call is made. Fetching generally implies that, for each imported segment, erasure-coded chunks are retrieved from enough unique validators ($\nicefrac{\Cecpiecesize}{2}$, including the guarantor) and is described in more depth in appendix \ref{sec:erasurecoding}. (Since we specify systematic erasure-coding, its reconstruction is trivial in the case that the correct $\nicefrac{\Cecpiecesize}{2}$ validators are responsive.) Chunks must be fetched for both the data itself and for justification metadata which allows us to ensure that the data is correct.

Validators, in their role as availability assurers, should index such chunks according to the index of the segments-tree whose reconstruction they facilitate. Since the data for segment chunks is so small at 12 octets, fixed communications costs should be kept to a bare minimum. A good network protocol (out of scope at present) will allow guarantors to specify only the segments-tree root and index together with a Boolean to indicate whether the proof chunk need be supplied. Since we assume at least 341 other validators are online and benevolent, we can assume that the guarantor can compute $\importsegmentdata$ and $\justifysegmentdata$ above with confidence, based on the general availability of data committed to with $\mathbf{s}^\clubsuit$, which is specified below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

341 here should be $\nicefrac{\Cecpiecesize}{2} - 1$ I guess, though that is a bit of a mouthful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe $\nicefrac{\Cecpiecesize}{2} - 1 = 341$ ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants