Skip to content

Add ext txn hash to update api#4199

Merged
jagathweerasinghe-da merged 42 commits intomainfrom
wee/add_ext_txn_hash_to_update_api
Mar 25, 2026
Merged

Add ext txn hash to update api#4199
jagathweerasinghe-da merged 42 commits intomainfrom
wee/add_ext_txn_hash_to_update_api

Conversation

@jagathweerasinghe-da
Copy link
Contributor

Pull Request Checklist

Cluster Testing

  • If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
  • If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

  • Include any change that might be observable by our partners or affect their deployment in the release notes.
  • Specify fixed issues with Fixes #n, and mention issues worked on using #n
  • Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

  • Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for propagating an externally-signed transaction hash through Scan’s update history HTTP API encoding/decoding and API schema, so clients can correlate external submissions with committed transactions.

Changes:

  • Extend update-history transaction encodings to include externalTransactionHash (HTTP string ↔ Ledger API ByteString).
  • Plumb externalTransactionHash through v1→v2 update conversion in the Scan HTTP handler.
  • Update OpenAPI schema and adjust tests to cover the new field.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodings.scala Adds encoding/decoding of externalTransactionHash between Ledger API transactions and HTTP update-history payloads.
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/HttpScanHandler.scala Ensures the hash is preserved when converting update-history transactions to the v2 representation.
apps/scan/src/main/openapi/scan.yaml Documents the new external_transaction_hash field on update-history transaction schemas.
apps/scan/src/test/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodingsTest.scala Extends round-trip encoding tests to include an external transaction hash.
apps/common/src/test/scala/org/lfdecentralizedtrust/splice/store/StoreTestBase.scala Updates test transaction builder to accept an external transaction hash.
Comments suppressed due to low confidence (4)

apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodings.scala:47

  • hexHashOption hand-rolls ByteString -> hex encoding via string formatting, while decoding uses HexFormat.parseHex. There is already a shared com.digitalasset.canton.util.HexString utility in the codebase for hex encoding/decoding; using it here would keep encoding/decoding symmetric and avoids maintaining custom formatting logic.
  private def hexHashOption(extTxnHash: ByteString): Option[String] = {
    if (extTxnHash.isEmpty) {
      None
    } else {
      Some(extTxnHash.toByteArray.map("%02x" format _).mkString)
    }

apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/ScanHttpEncodings.scala:271

  • Decoding externalTransactionHash via HexFormat.of().parseHex will throw an IllegalArgumentException on invalid input, which will bubble up without context. Consider using the existing hex parsing utility (e.g., HexString.parseToByteString) and failing with a clearer error message if the string is not valid hex.
            http.externalTransactionHash
              .map(hex => ByteString.copyFrom(HexFormat.of().parseHex(hex)))
              .getOrElse(ByteString.EMPTY),

apps/scan/src/main/openapi/scan.yaml:2388

  • Grammar: "For transaction externally signed" is ungrammatical/ambiguous. Consider rewording to something like "For externally signed transactions" (and optionally clarify that the value is a hex-encoded hash).
            For transaction externally signed, contains the external transaction hash
            signed by the external party. Can be used to correlate an external submission with a committed transaction.

apps/scan/src/main/openapi/scan.yaml:2451

  • Grammar: "For transaction externally signed" is ungrammatical/ambiguous. Consider rewording to something like "For externally signed transactions" (and optionally clarify that the value is a hex-encoded hash).
            For transaction externally signed, contains the external transaction hash
            signed by the external party. Can be used to correlate an external submission with a committed transaction.

[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Copy link
Contributor

@OriolMunoz-da OriolMunoz-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR itself LGTM,though there is one concern I have wrt backfilling (which is already the case in #4109 afaict):

external_transaction_hash is now introduced for /v0/backfilling/updates-before, i.e., we will backfill the external_transaction_hash.

There's now the situation where e.g.:

  • sv-1 and sv-2 upgrade to the version that has this code
  • sv-3 and sv-4 are slower to upgrade so they don't for a few hours
  • an externally signed transaction comes => sv-1 and sv-2 set the hash, sv-3 and sv-4 don't

How should:

  • A client doing BFT reads from Scan handle that?
  • A newly onboarded / recovered SV doing backfilling?

The problem being that for these 4 SVs (f=1), there's not really consensus as to whether the transaction does or does not have a transaction hash, so the client will randomly get one or the other.

(the answer might very well be that I'm missing some chunk of code for this, or that it doesn't matter and we're willing to accept this for a while)

cc @ray-roestenburg-da

rightChildId2 -> mkCreate(rightChildId2),
),
externalTransactionHash =
Option("4d68f590e4a298d9617ebe07b98c6ecbe04b7f3d7a5327f0e0ad4719638302b7"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Option("4d68f590e4a298d9617ebe07b98c6ecbe04b7f3d7a5327f0e0ad4719638302b7"),
Some("4d68f590e4a298d9617ebe07b98c6ecbe04b7f3d7a5327f0e0ad4719638302b7"),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ray-roestenburg-da
Copy link
Contributor

Let's also have @rautenrieth-da have a look at this especially the backfilling side of things when he is back tomorrow (or when convenient)

@OriolMunoz-da
Copy link
Contributor

The more I think about it, the more I think

or that it doesn't matter and we're willing to accept this for a while

is the case. The BFT connection will never fail, it will just say that there's a transaction or that there isn't. For a period of time whether or not there's a hash might be inconsistent, but that's unlikely to matter given all the previous transactions do not have one.

Copy link
Contributor

@rautenrieth-da rautenrieth-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but I am worried about the responses not being BFT safe. If every SV starts including the hashes at a different time, you would get a chunk of bulk storage history where none of the SVs agree on the hash. That seems more bad than the updates API endpoint where you always have a majority that agrees on the response.

a chunk of bulk storage history where none of the SVs agree on the hash

@isegall-da @ray-roestenburg-da would this be acceptable? If not, the scan app needs to have a record time (from the config or an on-ledger contract) that defines after which time the new field must be included. This value will have to be kept forever because it needs to be respected for backfilling.

workflowId: Option[String],
commandId: Option[String],
eventId: String,
externalTransactionHash: Array[Byte],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This column is nullable in the schema, can we make it Option[Array[Byte]] here, and let users handle missing values?

Copy link
Contributor

@ray-roestenburg-da ray-roestenburg-da Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be an empty array if it is missing. Otherwise you have None, empty, and non-empty that you would have to deal with.

@OriolMunoz-da
Copy link
Contributor

Looks good in general, but I am worried about the responses not being BFT safe. If every SV starts including the hashes at a different time, you would get a chunk of bulk storage history where none of the SVs agree on the hash. That seems more bad than the updates API endpoint where you always have a majority that agrees on the response.

the SVs will agree that it's either empty, or set (never disagree on the actual value if it's defined), meaning that there would be always f+1 agreeing on None or Some(x)
the only problem would be that two different BFT-calls might return a different value... but I'd say that's probably fine during the upgrade window

@rautenrieth-da
Copy link
Contributor

the SVs will agree that it's either empty, or set (never disagree on the actual value if it's defined), meaning that there would be always f+1 agreeing on None or Some(x)

That's only the case for the updates endpoint. I was talking about bulk storage, where we write a large number of updates into one file, and then take the hash of the file contents. Unless all updates within the file are bitwise identical, the hash of the whole file will differ.

@OriolMunoz-da
Copy link
Contributor

I was talking about bulk storage, where we write a large number of updates into one file, and then take the hash of the file contents.

gah, my bad. Indeed that's a problem.
Just to reiterate, effectively the problem already starts with #4109. Let me know if we have to roll it back for tomorrow's release to avoid mismatching tx hashes.

[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
@isegall-da
Copy link
Contributor

a chunk of bulk storage history where none of the SVs agree on the hash

Quite problematic! We definitely plan on bulk storage being identical across all SVs so that comparing hashes of objects suffices.

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[static]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
…te' into wee/add_ext_txn_hash_to_update_api

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
…ashes_and_expose_with_threshold_date

[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[static]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
@jagathweerasinghe-da
Copy link
Contributor Author

@rautenrieth-da Could you please have a look?
Here is the summary:

  • hashes are always added to: vX/updates/{update_id}
  • hashes are added after thresholdDate to other APIs: e.g. vX/updates

[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Copy link
Contributor

@rautenrieth-da rautenrieth-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general! I'd like to only check why we decided to always include hashes in the pointwise lookup

txWithMigration,
encoding = encoding,
version = if (consistentResponses) ScanHttpEncodings.V1 else ScanHttpEncodings.V0,
hashInclusionPolicy = ExternalHashInclusionPolicy.AlwaysInclude,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remind me, why did we decide to always include the external hash for pointwise lookups? This will make the getUpdateById method not BFT safe, and it will return data inconsistent with the update stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is vX/updates/{update_id} and vX/updates/hashes/{hash} should always include;
My assumption might be wrong; Its a valid point, once we expose it we don't how the users are going to use it.

Does this mean BFT safety must be applied for all API endpoints, including updates by hash (vX/updates/hashes/{hash})? If so, we have to communicate this to the product team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO all endpoints should be BFT safe by default, as users may not trust any single scan and do a BFT read for every scan request. We could exempt individual responses fields from BFT safety, but that does not work for clients that use the whole opaque response when checking response equality (like we do in the validator scan proxy).

For this reason I would keep at least vX/updates/{update_id} BFT safe.

For vX/updates/hashes/{hash} I'm less sure, I guess here it's OK if we always include the hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the product team has no specification for vX/updates/{update_id} endpoint, this approach should be fine. They are concerned about having the vX/updates/hashes/{hash} endpoint.

So, we can go ahead with:

vX/updates: BFT Safe
vX/updates/{update_id} : BFT Safe
vX/updates/hashes/{hash}: Not BFT Safe

So, taking this approach, WDYT about mentioning that vX/updates/hashes/{hash} is not BFT safe in the API documentation?

Copy link
Contributor

@ray-roestenburg-da ray-roestenburg-da Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main request has been to be able to lookup a transaction by transaction id, so I think the get by update id API does not have to include the hash. If we need that for some reason, we can add a request parameter so people can decide they want that returned in the response as well, and we can explain that in certain cases they might get different responses from SVs though only for some period, iiuc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document it as such, and I would assume at some point we'll know the actual threshold date and can add that. Does that also mean we can include the has in by update id if we know the threshold in the system right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already set a conservative threshold date of 2030/01/01, which we can change later as needed.
The APIs are going to be like below:

vX/updates: BFT Safe. Hashes created after the threshold date will be included in the response.
vX/updates/{update_id} : BFT Safe. Hashes created after the threshold date will be included in the response.
vX/updates/hashes/{hash}: Not BFT Safe. Supported irrespective of the created date.

Does that answer your question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not safe only for the hashes created before the threshold date

It's even slightly better, the exact behavior for vX/updates/hashes/{hash} should be:

There are record times R0 and R1 where:

  • For transactions commited before R0, all scans will return 404
  • For transactions committed between R0 and R1, some scans will return 404 and some scans will return a transaction
  • For transactions committed after R1, all scans will return a transaction
  • If two scans return a non-404 response, then the response payloads will be identical
  • The difference between R0 and R1 is expected to be in the order of a few days
  • R1 is expected to be weeks before the cutoff date after which transaction hashes will start to appear in the v2/updates stream

"not BFT safe for the hashes created before a threshold date" sounds good as a short description.

I would assume at some point we'll know the actual threshold date

Yes, but it depends on the network and changes after network resets.

Copy link
Contributor

@ray-roestenburg-da ray-roestenburg-da Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jagathweerasinghe-da I would document, vX/updates/hashes/{hash} is BFT safe for hashes that were computed for transactions that were created after the threshold date. It's not, by definition, 'not-BFT safe'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not safe only for the hashes created before the threshold date

It's even slightly better, the exact behavior for vX/updates/hashes/{hash} should be:

There are record times R0 and R1 where:

  • For transactions commited before R0, all scans will return 404
  • For transactions committed between R0 and R1, some scans will return 404 and some scans will return a transaction
  • For transactions committed after R1, all scans will return a transaction
  • If two scans return a non-404 response, then the response payloads will be identical
  • The difference between R0 and R1 is expected to be in the order of a few days
  • R1 is expected to be weeks before the cutoff date after which transaction hashes will start to appear in the v2/updates stream

"not BFT safe for the hashes created before a threshold date" sounds good as a short description.

I would assume at some point we'll know the actual threshold date

Yes, but it depends on the network and changes after network resets.

Great, we will add this in the next PR, which brings the vX/updates/hashes/{hash} endpoint.

}

sealed trait BackfillingRequirement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid useless whitespace changes

[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
@jagathweerasinghe-da
Copy link
Contributor Author

@rautenrieth-da PTAL.

[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
[ci]

Signed-off-by: Jagath Weerasinghe <jagath.weerasinghe@digitalasset.com>
Copy link
Contributor

@rautenrieth-da rautenrieth-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jagathweerasinghe-da jagathweerasinghe-da merged commit d44b71b into main Mar 25, 2026
67 checks passed
@jagathweerasinghe-da jagathweerasinghe-da deleted the wee/add_ext_txn_hash_to_update_api branch March 25, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants