-
Notifications
You must be signed in to change notification settings - Fork 97
feat(backend): multi-hop payments with static routing #3566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for brilliant-pasca-3e80ec canceled.
|
🚀 Performance Test ResultsTest Configuration:
Test Metrics:
📜 Logs
|
Just checking in with some observations to follow up on the issues @sanducb mentioned in the call last week. These were:
I ran the tenanted open payments flow, which always completed but sometimes showed these error logs:
I rang the non-tenanted open payments flow and the first time saw the create quote command take ~10 seconds then return an
Then a successive try to create quote worked. The rest of the flow worked as well. Then I tried again, and the entire flow worked. I tore everything down including the volumes and retried and saw an error in bruno on the grant request for incoming payment ( |
I was thinking about how cumbersome the localenv options were getting when adding the localenv script for creating the multitenancy environment. Perhaps something we could do in the future is have a bash script that takes in options modularly like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't finished reviewing everything, but wanted to share the comments I have already
localenv/cloud-nine-wallet/seed.yml
Outdated
peerUrl: http://happy-life-bank-backend:3002 | ||
peerIlpAddress: test.happy-life-bank | ||
- initialLiquidity: '1000000000000' | ||
peerUrl: http://intergalactic-bank-backend:3002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks the most basic local environment setup. There might need to be different docker-compose
files for the multihop environment that point to different seed files, so that cloud-nine-wallet
can route to happy-life-bank
with or without any hops in the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Current localenv
setup only makes sense for the purpose of testing the multi-hop logic. I will think about a way of making multi-hop coexist with all the other setups and bring it up here for discussion. Your suggestion here sounds like a good avenue to explore.
packages/backend/src/payment-method/ilp/connector/ilp-routing/test/service.test.ts
Outdated
Show resolved
Hide resolved
…github.com/interledger/rafiki into multihop-static-routing-v1
…ltihop-static-routing-v1
…nto multihop-static-routing-v1
I addressed some comments in the latest commits and made some adjustments that I detailed in the Notes section of the PR description. Please check the updated description before re-reviewing. @njlie @BlairCurrey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I tested the Bruno collection and it seems to be working. I just had a small note about the tests
) | ||
}) | ||
|
||
test('Updates peer routes in router service', async (): Promise<void> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good for the update tests to also ensure that the prior routes were cleared before the new ones are synced with a .toHaveBeenCalled()
check or something.
…w ones in peers update test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and appears to be working as expected locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All working for me!
I'm thinking, though, to think to keep our local playground docs accurate (and keep the local playground simple), we should have the pnpm localenv:compose
be the same, such that it only starts up two Rafiki nodes.
What we can do is have a pnpm localenv:compose:multihop
, which starts up the three nodes as you have them now, and because you updated the UpdatePeer
Mutation, we can just have the seed script update the Peering routes on start, such that we "enable" the multi hop functionality without having to do much else :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think deleting a peer will be difficult given the FK constraints + the accounting data. (and also maybe we shouldn't for audit reasons). Instead, we could create an additional multihop
docker compose (kind of how the merged
docker compose file gets added to the pnpm command) which configures the containers to use a separate seed files. In the seed files, the peer which will be skipped for routing will just have "unreachable" ILP address & routes, kind of how you did it for global bank for the cloud nine wallet seed file. This makes it possible to not have to have any custom code in the seed script for global bank
if (options.routes !== undefined) { | ||
const staticIlpAddress = | ||
options.staticIlpAddress ?? existingPeer.staticIlpAddress | ||
updateData.routes = [staticIlpAddress, ...options.routes] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the staticIlpAddress
already added in syncPeerRoutes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in syncPeerRoutes
we already add the staticIlpAddress
to the routing table. Here we are making sure that we also update the routes db entry with the peer's static address since that is used here when loading the routes from the db. Redundancy in syncPeerRoutes
exists to make sure that there is no way to not have the static address in the routing table (even if the peer might be modified to have invalid routes) by taking the address from the peer entity.
Changes proposed in this pull request
outgoing-payment
,incoming-payment
,routing
orunknown
(rate probes fall into this category)Context
Closes #3444
Overall setup and routing logic
The setup creates 3 instances where instance A is peered with B, B is peered with C (please check the setup for exact instance names).
Payments should be successful from A -> B -> C by using the existing Bruno collection. Instances A and C were kept as
cloud-nine
andhappy-life-bank
in order to have minimal changes of the existing setup.At startup of a Rafiki instance, routes are loaded from the database and stored in the in memory routing table. All subsequent peer updates will also refresh the routing table. For backwards compatibility, if no routes exist then direct peers' address and asset id will be used to populate the routing table.
A routing table entry has the following structure:
|
tenantId:destination
|next hop
|asset id
|where:
tenantId
is the tenant id of the callerdestination
is the static ILP address of the payment receiver.next hop
is the peer id of the direct peer that will either route or be the destination of the packetasset id
is the asset id of the next hop peer -> this field is mandatory when adding/removing a route but not when querying for the next hop, as one could or could not be interested in what asset the peering relationship has when forwarding the packet.tenantId:destination
is calledprefix
in the implementation and is the key of the table. Longest prefix matching is done against this key.The routing logic is now also responsible for resolving the peering asymmetry issue described here in a multi-tenanted environment.
Telemetry
There are 2 key metrics added in this PR:
ilp_prepare_packet_processing_ms
: Measures the time it takes to process individual ILP prepare packets through the connector middleware and is a histogram with a label that denotes the operation of the packet (outgoing_payment
,incoming_payment
,routing
,unknown
-> which includes rate probes). In the ILP metrics Grafana dashboard you can see P50 and P95 percentiles panels for tracking latency.ilp_payment_round_trip_ms
: Measures the round-trip time for completing ILP payment (on the sender side). This one is also a histogram and the average round-trip time can be seen in the dashboard.Local testing
Only non-tenanted setup will now have the multi-hop feature with 3 instances.
Use any of the non-tenanted Open Payments Bruno collections as-is to test this flow.
Run
pnpm localenv:compose:multihop up
to spin it locally.Notes
maxPacketAmount
was added to the seed is that if it is not set, then rate probes will lock a big part of a peer's balance during quoting that will only be moved/released it on receiving fulfill/reject. The total amount locked is usually10^13 + 10^12 + ... + 10^3
(adding each probe packet's decreasing value). This causes issues with payments throughput, as even a few payments could lock the whole balance of a peer. We want to mitigate that by settingmaxPacketAmount
to a reasonable value, such that rate probe packets will not lock any value until they match the expectedmaxPacketAmount
. Therefore, expect to see error logs (caused by reject packets) forAmountTooLargeError
when quoting until rate probes will match themaxPacketAmount
set. Onlyglobal-bank
needsmaxPacketAmount
set since this is relevant for "receiving" instances when quoting.