- 
                Notifications
    
You must be signed in to change notification settings  - Fork 29
 
add swip-25: pullsync protocol improvement #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,88 @@ | ||||||||||
| --- | ||||||||||
| SWIP: 25 | ||||||||||
| title: More efficient pull syncing within neighbourhood | ||||||||||
| author: Viktor Tron <@zelig>, Viktor Tóth <@nugaon> | ||||||||||
| discussions-to: https://discord.com/channels/799027393297514537/1239813439136993280 | ||||||||||
| status: Draft | ||||||||||
| type: <Standards Track (Core)> | ||||||||||
| created: 2025-02-24 | ||||||||||
| --- | ||||||||||
| 
     | 
||||||||||
| <!--You can leave these HTML comments in your merged SWIP and delete the visible duplicate text guides, they will not appear and may be helpful to refer to if you edit it again. This is the suggested template for new SWIPs. Note that a SWIP number will be assigned by an editor. When opening a pull request to submit your SWIP, please use an abbreviated title in the filename, `SWIP-draft_title_abbrev.md`. The title should be 44 characters or less.--> | ||||||||||
| 
     | 
||||||||||
| ## Simple Summary | ||||||||||
| <!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the SWIP.--> | ||||||||||
| This SWIP describes a more efficient way to synchronise content between peers in the same neighbourhood. | ||||||||||
| 
     | 
||||||||||
| ### Glossary | ||||||||||
| 
     | 
||||||||||
| - **Pull-sync**: A protocol that is responsible for syncing all the chunks that all nodes within a neighbourhood need to store in their reserve. The protocol itself is well established and shall not change. | ||||||||||
| - **Pivot**: Strategies of pull-syncing involve the perspective of a particular node, the **pivot node**, and concern the algorithm that dictates which particular address bins and binID ranges the pivot should be requesting from their peers. | ||||||||||
| - **Proximity Order (PO)**: measure of proximity, calculating the number of matching leading bits that are common to (the big-endian binary representation of) two addresses. | ||||||||||
| - **Reserve**: network-wide reserve is the set of chunks pushed to the network with a valid postage stamp. | ||||||||||
| - **Bin X of M**: Bin $x$ of a node $M$ contains all the chunks in the network reserve the PO of which with M is not lower than $D$: $\mathrm{Bin}_X(M) := \lbrace c\in\mathrm{Reserve}\mid\mathit{PO}(\mathit{Addr}(c), \mathit{Addr}(M)) = X\rbrace$. | ||||||||||
| - **A's Neighbourhood of depth D** An address range, elements of which share at least $D$ bits with $A$: | ||||||||||
| $\lbrace c \in \mathrm{Chunks}\mid \mathit{PO}(\mathit{Addr}(c),\mathit{Addr}(M)) \geq D\rbrace$. | ||||||||||
| Alternatively if $A$ is the address of node $M$, the chunks in $M$'s neighbourhood of depth $D$ can also be expressed as the union of all $M$'s bins at and beyond $D$, | ||||||||||
| $\lbrace c\in\mathrm{Chunks}\mid \mathrm{NH}_D(\mathit{Addr}(M))\rbrace$ = $\bigcup_{x\geq D} \mathrm{bin}_X(M)$. | ||||||||||
| - **Storage depth**: Smallest integer $D$ such that $2^D$ neighbourhoods of depth $D$ (holding a disjoint replication sets of all their bins X, s.t. $X \geq D$ in ) is able to accommodate the network reserve. Assuming uniform utilisation across nh-s, and a node reserve depth of $t$, $D_s := \lceil \mathit{log}_2(N) \rceil - t$. | ||||||||||
| 
     | 
||||||||||
| ## Abstract | ||||||||||
| <!--A short (~200 word) description of the technical issue being addressed.--> | ||||||||||
| If a node is connected to swarm as a full node, it fires up the pullsync protocol, which is responsible for syncing all the chunks that our node needs to store.Currently the algorithm we use makes sure that on each peer connection, both parties try synchronising their entire reserve. More precisely, each peer start streaming the chunk hashes in batches for each proximity order that is greater or equal to the pull-sync depth (usually the neighbourhood depth). In this proposal, we offer a much more efficient algorithm, still capable of replicating the reserve. | ||||||||||
| 
     | 
||||||||||
| ## Motivation | ||||||||||
| <!--The motivation is critical for SWIPs that want to change the Swarm protocol. It should clearly explain why the existing protocol specification is inadequate to address the problem that the SWIP solves. SWIP submissions without sufficient motivation may be rejected outright.--> | ||||||||||
| Imagine, that a naive peer joins a neighbourhood, then they will 'subscribe to' each | ||||||||||
| depth of their peers within the neighbourhood. As they are receiving new chunks of course these are offering it too back to the peer they got it from. Plus they try to synchronise from each peer the entire reserve, not just part, which means a naive node's synchronisation involves exchange of `N*S` chunk hashrd where N is the neighbourhood size and S is the size of the reserve. This is hugely inefficient. | ||||||||||
| 
     | 
||||||||||
| ## Specification | ||||||||||
| <!--The technical specification should describe the syntax and semantics of any new feature. The specification should be detailed enough to allow competing, interoperable implementations for the current Swarm platform and future client implementations.--> | ||||||||||
| Each peer `P` takes all their peers they are allowed to synchronise with: `p_0, p_1, ..., p_n`. | ||||||||||
| All chunks need to be syncronized only once. | ||||||||||
| How about we syncronize each chunks from its closest peer among the neighborhood peers. | ||||||||||
| 
         
      Comment on lines
    
      +42
     to 
      +43
    
   
  
    
 | 
||||||||||
| All chunks need to be syncronized only once. | |
| How about we syncronize each chunks from its closest peer among the neighborhood peers. | |
| All chunks need to be synchronized only once. | |
| How about we synchronize each chunk from its closest peer among the neighborhood peers. | 
    
      
    
      Copilot
AI
    
    
    
      Oct 27, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'stretegy' to 'strategy'.
| Unlike the earlier algorithm, this one is extremely sensitive to the changing peerset, so every single time there is a change in the neighbours, pullsync stretegy needs to be reevaluated. | |
| Unlike the earlier algorithm, this one is extremely sensitive to the changing peerset, so every single time there is a change in the neighbours, pullsync strategy needs to be reevaluated. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs recalculating all UDs and maybe add or drop bin subscriptions at each peer in my understanding.
It is a bit vague what process the another peer should start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant is that then we would need to start the same sync process with another peer and that will be from the start if the peer is new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this @nugaon
    
      
    
      Copilot
AI
    
    
    
      Oct 27, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'neeeded' to 'needed' and 'retrievebility' to 'retrievability'. Additionally, 'cos' should be 'because' in formal documentation.
| Thorough testing is neeeded, cos this can produce inconsistencies in the localstore and has major impact for retrievebility. | |
| Thorough testing is needed, because this can produce inconsistencies in the localstore and has major impact for retrievability. | 
    
      
    
      Copilot
AI
    
    
    
      Oct 27, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'syncronized' to 'synchronized'.
| - all chunks of leaf nodes must be syncronized from its stored peer. | |
| - all chunks of leaf nodes must be synchronized from its stored peer. | 
    
      
    
      Copilot
AI
    
    
    
      Oct 27, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing slash from 'Copyright/' header.
| ## Copyright/ | |
| ## Copyright | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'hashrd' to 'hashes'.