Add ext txn hash to update api#4199
Conversation
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
There was a problem hiding this comment.
Pull request overview
Adds support for propagating an externally-signed transaction hash through Scan’s update history HTTP API encoding/decoding and API schema, so clients can correlate external submissions with committed transactions.
Changes:
- Extend update-history transaction encodings to include
externalTransactionHash(HTTP string ↔ Ledger APIByteString). - Plumb
externalTransactionHashthrough v1→v2 update conversion in the Scan HTTP handler. - Update OpenAPI schema and adjust tests to cover the new field.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodings.scala | Adds encoding/decoding of externalTransactionHash between Ledger API transactions and HTTP update-history payloads. |
| apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/HttpScanHandler.scala | Ensures the hash is preserved when converting update-history transactions to the v2 representation. |
| apps/scan/src/main/openapi/scan.yaml | Documents the new external_transaction_hash field on update-history transaction schemas. |
| apps/scan/src/test/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodingsTest.scala | Extends round-trip encoding tests to include an external transaction hash. |
| apps/common/src/test/scala/org/lfdecentralizedtrust/splice/store/StoreTestBase.scala | Updates test transaction builder to accept an external transaction hash. |
Comments suppressed due to low confidence (4)
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodings.scala:47
hexHashOptionhand-rolls ByteString -> hex encoding via string formatting, while decoding usesHexFormat.parseHex. There is already a sharedcom.digitalasset.canton.util.HexStringutility in the codebase for hex encoding/decoding; using it here would keep encoding/decoding symmetric and avoids maintaining custom formatting logic.
private def hexHashOption(extTxnHash: ByteString): Option[String] = {
if (extTxnHash.isEmpty) {
None
} else {
Some(extTxnHash.toByteArray.map("%02x" format _).mkString)
}
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodings.scala:271
- Decoding
externalTransactionHashviaHexFormat.of().parseHexwill throw anIllegalArgumentExceptionon invalid input, which will bubble up without context. Consider using the existing hex parsing utility (e.g.,HexString.parseToByteString) and failing with a clearer error message if the string is not valid hex.
http.externalTransactionHash
.map(hex => ByteString.copyFrom(HexFormat.of().parseHex(hex)))
.getOrElse(ByteString.EMPTY),
apps/scan/src/main/openapi/scan.yaml:2388
- Grammar: "For transaction externally signed" is ungrammatical/ambiguous. Consider rewording to something like "For externally signed transactions" (and optionally clarify that the value is a hex-encoded hash).
For transaction externally signed, contains the external transaction hash
signed by the external party. Can be used to correlate an external submission with a committed transaction.
apps/scan/src/main/openapi/scan.yaml:2451
- Grammar: "For transaction externally signed" is ungrammatical/ambiguous. Consider rewording to something like "For externally signed transactions" (and optionally clarify that the value is a hex-encoded hash).
For transaction externally signed, contains the external transaction hash
signed by the external party. Can be used to correlate an external submission with a committed transaction.
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
apps/common/src/main/scala/org/lfdecentralizedtrust/splice/store/UpdateHistory.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
PR itself LGTM,though there is one concern I have wrt backfilling (which is already the case in #4109 afaict):
external_transaction_hash is now introduced for /v0/backfilling/updates-before, i.e., we will backfill the external_transaction_hash.
There's now the situation where e.g.:
- sv-1 and sv-2 upgrade to the version that has this code
- sv-3 and sv-4 are slower to upgrade so they don't for a few hours
- an externally signed transaction comes => sv-1 and sv-2 set the hash, sv-3 and sv-4 don't
How should:
- A client doing BFT reads from Scan handle that?
- A newly onboarded / recovered SV doing backfilling?
The problem being that for these 4 SVs (f=1), there's not really consensus as to whether the transaction does or does not have a transaction hash, so the client will randomly get one or the other.
(the answer might very well be that I'm missing some chunk of code for this, or that it doesn't matter and we're willing to accept this for a while)
| rightChildId2 -> mkCreate(rightChildId2), | ||
| ), | ||
| externalTransactionHash = | ||
| Option("4d68f590e4a298d9617ebe07b98c6ecbe04b7f3d7a5327f0e0ad4719638302b7"), |
There was a problem hiding this comment.
| Option("4d68f590e4a298d9617ebe07b98c6ecbe04b7f3d7a5327f0e0ad4719638302b7"), | |
| Some("4d68f590e4a298d9617ebe07b98c6ecbe04b7f3d7a5327f0e0ad4719638302b7"), |
|
Let's also have @rautenrieth-da have a look at this especially the backfilling side of things when he is back tomorrow (or when convenient) |
|
The more I think about it, the more I think
is the case. The BFT connection will never fail, it will just say that there's a transaction or that there isn't. For a period of time whether or not there's a hash might be inconsistent, but that's unlikely to matter given all the previous transactions do not have one. |
rautenrieth-da
left a comment
There was a problem hiding this comment.
Looks good in general, but I am worried about the responses not being BFT safe. If every SV starts including the hashes at a different time, you would get a chunk of bulk storage history where none of the SVs agree on the hash. That seems more bad than the updates API endpoint where you always have a majority that agrees on the response.
a chunk of bulk storage history where none of the SVs agree on the hash
@isegall-da @ray-roestenburg-da would this be acceptable? If not, the scan app needs to have a record time (from the config or an on-ledger contract) that defines after which time the new field must be included. This value will have to be kept forever because it needs to be respected for backfilling.
apps/app/src/test/scala/org/lfdecentralizedtrust/splice/util/UpdateHistoryTestUtil.scala
Show resolved
Hide resolved
apps/app/src/test/scala/org/lfdecentralizedtrust/splice/util/UpdateHistoryTestUtil.scala
Show resolved
Hide resolved
| workflowId: Option[String], | ||
| commandId: Option[String], | ||
| eventId: String, | ||
| externalTransactionHash: Array[Byte], |
There was a problem hiding this comment.
This column is nullable in the schema, can we make it Option[Array[Byte]] here, and let users handle missing values?
There was a problem hiding this comment.
It will be an empty array if it is missing. Otherwise you have None, empty, and non-empty that you would have to deal with.
the SVs will agree that it's either empty, or set (never disagree on the actual value if it's defined), meaning that there would be always |
That's only the case for the updates endpoint. I was talking about bulk storage, where we write a large number of updates into one file, and then take the hash of the file contents. Unless all updates within the file are bitwise identical, the hash of the whole file will differ. |
gah, my bad. Indeed that's a problem. |
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Quite problematic! We definitely plan on bulk storage being identical across all SVs so that comparing hashes of objects suffices. |
Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
…te' into wee/add_ext_txn_hash_to_update_api Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
…ashes_and_expose_with_threshold_date [ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[static] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
|
@rautenrieth-da Could you please have a look?
|
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
rautenrieth-da
left a comment
There was a problem hiding this comment.
Looks good in general! I'd like to only check why we decided to always include hashes in the pointwise lookup
| txWithMigration, | ||
| encoding = encoding, | ||
| version = if (consistentResponses) ScanHttpEncodings.V1 else ScanHttpEncodings.V0, | ||
| hashInclusionPolicy = ExternalHashInclusionPolicy.AlwaysInclude, |
There was a problem hiding this comment.
Can you please remind me, why did we decide to always include the external hash for pointwise lookups? This will make the getUpdateById method not BFT safe, and it will return data inconsistent with the update stream.
There was a problem hiding this comment.
My assumption is vX/updates/{update_id} and vX/updates/hashes/{hash} should always include;
My assumption might be wrong; Its a valid point, once we expose it we don't how the users are going to use it.
Does this mean BFT safety must be applied for all API endpoints, including updates by hash (vX/updates/hashes/{hash})? If so, we have to communicate this to the product team.
There was a problem hiding this comment.
IMO all endpoints should be BFT safe by default, as users may not trust any single scan and do a BFT read for every scan request. We could exempt individual responses fields from BFT safety, but that does not work for clients that use the whole opaque response when checking response equality (like we do in the validator scan proxy).
For this reason I would keep at least vX/updates/{update_id} BFT safe.
For vX/updates/hashes/{hash} I'm less sure, I guess here it's OK if we always include the hash.
There was a problem hiding this comment.
As the product team has no specification for vX/updates/{update_id} endpoint, this approach should be fine. They are concerned about having the vX/updates/hashes/{hash} endpoint.
So, we can go ahead with:
vX/updates: BFT Safe
vX/updates/{update_id} : BFT Safe
vX/updates/hashes/{hash}: Not BFT Safe
So, taking this approach, WDYT about mentioning that vX/updates/hashes/{hash} is not BFT safe in the API documentation?
There was a problem hiding this comment.
The main request has been to be able to lookup a transaction by transaction id, so I think the get by update id API does not have to include the hash. If we need that for some reason, we can add a request parameter so people can decide they want that returned in the response as well, and we can explain that in certain cases they might get different responses from SVs though only for some period, iiuc.
There was a problem hiding this comment.
Please document it as such, and I would assume at some point we'll know the actual threshold date and can add that. Does that also mean we can include the has in by update id if we know the threshold in the system right?
There was a problem hiding this comment.
We have already set a conservative threshold date of 2030/01/01, which we can change later as needed.
The APIs are going to be like below:
vX/updates: BFT Safe. Hashes created after the threshold date will be included in the response.
vX/updates/{update_id} : BFT Safe. Hashes created after the threshold date will be included in the response.
vX/updates/hashes/{hash}: Not BFT Safe. Supported irrespective of the created date.
Does that answer your question.
There was a problem hiding this comment.
not safe only for the hashes created before the threshold date
It's even slightly better, the exact behavior for vX/updates/hashes/{hash} should be:
There are record times R0 and R1 where:
- For transactions commited before R0, all scans will return 404
- For transactions committed between R0 and R1, some scans will return 404 and some scans will return a transaction
- For transactions committed after R1, all scans will return a transaction
- If two scans return a non-404 response, then the response payloads will be identical
- The difference between R0 and R1 is expected to be in the order of a few days
- R1 is expected to be weeks before the cutoff date after which transaction hashes will start to appear in the
v2/updatesstream
"not BFT safe for the hashes created before a threshold date" sounds good as a short description.
I would assume at some point we'll know the actual threshold date
Yes, but it depends on the network and changes after network resets.
There was a problem hiding this comment.
@jagathweerasinghe-da I would document, vX/updates/hashes/{hash} is BFT safe for hashes that were computed for transactions that were created after the threshold date. It's not, by definition, 'not-BFT safe'.
There was a problem hiding this comment.
not safe only for the hashes created before the threshold date
It's even slightly better, the exact behavior for
vX/updates/hashes/{hash}should be:There are record times R0 and R1 where:
- For transactions commited before R0, all scans will return 404
- For transactions committed between R0 and R1, some scans will return 404 and some scans will return a transaction
- For transactions committed after R1, all scans will return a transaction
- If two scans return a non-404 response, then the response payloads will be identical
- The difference between R0 and R1 is expected to be in the order of a few days
- R1 is expected to be weeks before the cutoff date after which transaction hashes will start to appear in the
v2/updatesstream"not BFT safe for the hashes created before a threshold date" sounds good as a short description.
I would assume at some point we'll know the actual threshold date
Yes, but it depends on the network and changes after network resets.
Great, we will add this in the next PR, which brings the vX/updates/hashes/{hash} endpoint.
| } | ||
|
|
||
| sealed trait BackfillingRequirement | ||
|
|
There was a problem hiding this comment.
nit: avoid useless whitespace changes
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
|
@rautenrieth-da PTAL. |
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci] Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Pull Request Checklist
Cluster Testing
/cluster_teston this PR to request it, and ping someone with access to the DA-internal system to approve it./hdm_teston this PR to request it, and ping someone with access to the DA-internal system to approve it.PR Guidelines
Fixes #n, and mention issues worked on using#nMerge Guidelines