Skip to content

Proposal features: 16 -> 32 bit range check on LogUpย #702

@hero78119

Description

@hero78119

Background

Ceno heavily reply on LogUp to do range check, e.g. 16 bit range check
image
See circuits statistis here #585 (comment)

Most of lookup operation are contribution via 16 bit range check. Take add opcode as example, the 9 lookups are all contributes via

  • rs1, rs2, rd offline memory check assert timestamp < global timestamp => (3 * 2) = 6 of 16 bit range check
  • rd splitted into 2 of 16 bit range check
  • program fetch: 1 lookup
    So in overall there are 6 + 2 + 1 = 9 lookup

If we do 32 bit range check

  • rs1, rs2, rd offline memory check timestamp check => 3 lookup
  • rd range: 1 lookup
  • program fetch: 1 lookup
    overall is 5 lookup, which across 2^3 = 8 boundary. The tower sumcheck part leaf layer size will be cut half size. so the expected latency would be cut to half.

As a side effect, we also save bunch of witin for they are there for holding 16 limb, so it also benefit of mpcs since there are less polynomial.

Design Rationales

On logup formula right hand side, we have m(x) and T(x). One of nice property for 32 bits range check table T(x) is we can skip its commitment & PCS, for verifier can evaluate T(r) succinctly via tricks here. So another challenge is how to deal with huge & sparse polynomial m(x)

Via spartan paper p29 7.2.2. sparse polynomial commitment SPARK, we can view sparse m(x) into tuple of 3 dense [(i, j, M(i,j))] and commit 3 dense polynomial respectively. Giving original m(x) dense size is 2^32. The insight magic of the SPARK is via split into i, j polynomial, each size just match non-zero entries of m(x). I think the most innovation to break variables into row, col, is in SPARK offline memory check memory-in-the-head, it reduced audit_ts_(row/col) dense size from 2^32 to 2^16 size.

originally my question is why row + col instead of row, is it just to deal with R1CS matricx ? After some thought I found it's not, the key point to split into row, col is |audit_ts| size can be reduce exponentially, from 2^32 to 2 ^ 16. The math magic due to identity polynomial eq are splittable, each with exponentially size reduction, and can be evaluated separately.

Prover need to commit i, j, M, along with read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col related to SPARK protocol.

With SPARK, the e2e table proving flow will be like this

  • prover commit to original polynomiall related to table proof, plus few polynomial for SPARK: i, j, M, read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col
  • derived challenges
  • generate tower sumcheck proof, at the end of logup m(x), we have random point $r$ and evaluation v=m($r$)
  • generate mPCS proof for original polynomials.
  • generate spark_proof(r, v=m($r$), [i, j, M, read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col])
    • tower_product_sumcheck: for row, col offline memory check follow SPARK
    • one sumcheck: v = $\sum_k e_{row}(k) * e_{col}(k) * M(k)$
    • check offline memory check formula of row, col satisfication
    • derive new random point $r'$ and 9 evaluations i($r'$), j($r'$), M($r'$), read_ts_row($r'$), write_ts_row($r'$), ...
  • generate mPCS proof for SPARK protocol

What the overhead

In table proof part, since there is a new SPARK proof flow involve in critical path sequentially, the overall latency of table proof will increase. However all the opcode proof will benefit quite a lot on 32 bits range check. As the opcode proof occupied the major overhead, the increase overhead in table probably negligible.

In more detail analysis, the proving time overhead in SPARK is donimated by the size |read_ts_row|, |write_ts_row|, |read_ts_col|, |write_ts_col|, associate with number of non zero entry |m(x)|. In real world workload, if there are more repeated values to be range check, then non zero entry in |m(x)| will be even less, so the cost will be save quite a lots. The worst case happend suppose all the lookups value are distinct.

Sub Task breandown

Other side effects

This feature rely on base field to hold 32 bit riv32, therefore we need to stick to Goldilock64.
So the future roalmap will be Goldilock64 -> Binary Field, without mersenne31/babybear in transition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions