Traffic-based App Rewards: Compute aggregateActivityTotals, add ComputeAppRewardTrigger, expose new endpoints#4420
Conversation
meiersi-da
left a comment
There was a problem hiding this comment.
Thanks a lot. Sorry for the delay in the review.
There seems to have been a misunderstanding wrt featured app parties lookup. See my comments. Let's make another pass once the current comments have been addressed.
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/HttpScanHandler.scala
Show resolved
Hide resolved
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/HttpScanHandler.scala
Outdated
Show resolved
Hide resolved
...can/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/db/DbScanAppRewardsStore.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/lfdecentralizedtrust/splice/scan/automation/RewardComputationTrigger.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/lfdecentralizedtrust/splice/scan/automation/RewardComputationTrigger.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/lfdecentralizedtrust/splice/scan/automation/RewardComputationTrigger.scala
Outdated
Show resolved
Hide resolved
...can/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/db/DbScanAppRewardsStore.scala
Outdated
Show resolved
Hide resolved
...can/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/db/DbScanAppRewardsStore.scala
Outdated
Show resolved
Hide resolved
cac78b8 to
bc955b6
Compare
|
@meiersi-da PTAL, I've addressed the current set of comments |
meiersi-da
left a comment
There was a problem hiding this comment.
Thanks a lot. We are getting closer. The code continues to require a high attention to detail. Not an easy one to write or review ;)
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/ScanAppRewardsStore.scala
Outdated
Show resolved
Hide resolved
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/ScanAppRewardsStore.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/lfdecentralizedtrust/splice/scan/automation/RewardComputationTrigger.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/lfdecentralizedtrust/splice/scan/automation/RewardComputationTrigger.scala
Outdated
Show resolved
Hide resolved
| tasks <- (lastClosedO, earliestCompleteO) match { | ||
| case (Some((lastClosed, _)), Some(earliestComplete)) => | ||
| appRewardsStore | ||
| .getNextRoundWithoutRootHash(updateHistory.historyId, lastClosed) |
There was a problem hiding this comment.
This feels off as a function. It seems to do too much.
Why not query the latest round for which the computation completed; and also query the latest round with complete data. Then the tasks available are all the round from max(earliestComplete, latestRoundWithRootHash + 1) to min(lastClosed, latestComplete) (with some adjustments for the optionals).
There was a problem hiding this comment.
You can then just pick the first 4 round numbers from this interval.
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/HttpScanHandler.scala
Show resolved
Hide resolved
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/admin/http/HttpScanHandler.scala
Outdated
Show resolved
Hide resolved
...alizedtrust/splice/integration/tests/ScanAppRewardsComputationTimeBasedIntegrationTest.scala
Show resolved
Hide resolved
| "return activity totals after aggregation trigger runs" in { implicit _env => | ||
| // Advance enough rounds so that at least one round is fully closed and aggregated. | ||
| // Round 0 requires multiple ticks to close. | ||
| for (_ <- 1 to 7) { |
There was a problem hiding this comment.
Why do you need 7 rounds?
I'd expect that closing rounds 0 and 1 should be sufficient; i.e, you need to advance rounds two times.
There was a problem hiding this comment.
Changed the comment, that function increments by ticks not rounds.
...alizedtrust/splice/integration/tests/ScanAppRewardsComputationTimeBasedIntegrationTest.scala
Outdated
Show resolved
Hide resolved
066c3c1 to
6c01e33
Compare
|
@meiersi-da: PTAL, most comments are addressed. I plan at least one more change; #4322 is now merged, so I will rebase on main and implement TODOs that can be now be resolved |
meiersi-da
left a comment
There was a problem hiding this comment.
Thanks a lot. Please review the bugs and address them with tests that exhibit them before fixing them.
It's OK to not build a test for rewards being computed sequentially.
| * A round is complete if the prior round also has activity records, | ||
| * proving ingestion was running continuously through it. |
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/AppActivityStore.scala
Outdated
Show resolved
Hide resolved
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/AppActivityStore.scala
Outdated
Show resolved
Hide resolved
apps/scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/ScanAppRewardsStore.scala
Outdated
Show resolved
Hide resolved
| from #${Tables.appActivityRecords} aar | ||
| where exists( | ||
| select 1 from #${Tables.appActivityRecords} | ||
| where round_number = aar.round_number - 1 |
There was a problem hiding this comment.
Check the query plan for this. This might be expensive to do that way.
If you do want to check the existence, then first query for the max and do a join against that single value, instead of potentially forcing an expensive join.
There was a problem hiding this comment.
something along the lines of
select max_round
from
(select max(round) - 1
from app_activity_records
where history_id = $historyId
)
where exists (
select 1
from app_activity_records
where history_id = $history_id and round = max_round
)also: please add tests that surface the problem wrt missing check of history_id, and check all your queries wrt the missing comparison of history_id
There was a problem hiding this comment.
Fixed; added, verifying with store tests that would have otherwise failed due to the missing historyId
| override protected def isStaleTask( | ||
| task: RewardComputationTrigger.Task | ||
| )(implicit tc: TraceContext): Future[Boolean] = | ||
| Future.successful(false) |
There was a problem hiding this comment.
It is stale if the latest completed round is larger than the task round. Should not happen, but seems like the better definition.
There was a problem hiding this comment.
Implemented this.
| lastClosedO <- store.lookupRoundOfLatestData() | ||
| earliestCompleteO <- appActivityStore.earliestRoundWithCompleteAppActivity() | ||
| latestCompleteO <- appActivityStore.latestRoundWithCompleteAppActivity() | ||
| latestComputedO <- appRewardsStore.lookupLatestRoundWithRewardComputation() |
There was a problem hiding this comment.
This actually turns out to be wrong when the trigger executes tasks in parallel and the backend gets restarted: round 2 might complete before round 1, and thus round 1 will never be completed.
So what I'd suggest to do for now is:
- return at most one task and thus ensure that they get solved one after the other
- create a TODO for an issue to compute tasks in parallel, which will require introducing a watermark table to remember the round number up to which all rewards have been computed. We might not need that. It will depend on how fast catchup works for the round trigger.
FYI: @rautenrieth-da -- catchup speed of reward computation is something to add to the test plan
...rc/main/scala/org/lfdecentralizedtrust/splice/scan/automation/RewardComputationTrigger.scala
Outdated
Show resolved
Hide resolved
| def computeRewards( | ||
| roundNumber: Long | ||
| )(implicit tc: TraceContext): Future[Unit] = | ||
| aggregateActivityTotals(roundNumber) |
There was a problem hiding this comment.
prudent engineering: add safety check that all activity records have been ingested for the round in question. Better safe than sorry.
There was a problem hiding this comment.
Fixed: added an assertion; verified with a store test
| ] = { | ||
| implicit val tc = extracted | ||
| withSpan(s"$workflowId.getRewardAccountingEarliestAvailableRound") { _ => _ => | ||
| appRewardsStore.getEarliestRoundWithActivityTotals().map { |
There was a problem hiding this comment.
AFAIR, the goal of the HTTP function is to enable other SVs to determine whether the earliest round for which this SV will eventually have the activity totals available. So we should return the earliest round for which there is a complete set of activity records and not the earliest one for which activity totals have been computed.
There was a problem hiding this comment.
I suspect we actually have no need to retrieve the earliest round with activity totals, so we can remove that function from the ScanAppRewardsStore.
751ea69 to
8f12e5d
Compare
|
@meiersi-da PTAL. FYI: In 69a15c3, In the next PR of #4384 it will need obtain data from the appropriate OpenMiningRound, and this w |
1369090 to
0effc2c
Compare
cad918f to
2ee47df
Compare
0effc2c to
79c3627
Compare
…tory_for_hash Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Co-authored-by: Simon Meier <simon@digitalasset.com> Signed-off-by: Timothy Emiola <adetokunbo@users.noreply.github.com>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Co-authored-by: Simon Meier <simon@digitalasset.com> Signed-off-by: Timothy Emiola <adetokunbo@users.noreply.github.com>
- change the condition so that two consecutive rounds have to have activity Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Co-authored-by: Simon Meier <simon@digitalasset.com> Signed-off-by: Timothy Emiola <adetokunbo@users.noreply.github.com>
Co-authored-by: Simon Meier <simon@digitalasset.com> Signed-off-by: Timothy Emiola <adetokunbo@users.noreply.github.com>
Co-authored-by: Simon Meier <simon@digitalasset.com> Signed-off-by: Timothy Emiola <adetokunbo@users.noreply.github.com>
Co-authored-by: Simon Meier <simon@digitalasset.com> Signed-off-by: Timothy Emiola <adetokunbo@users.noreply.github.com>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
- use the historyId and add unittests that verify historyId isolation - use a more efficient query, uses the index better Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
Signed-off-by: Tim Emiola <adetokunbo@emio.la>
…ctivity records Signed-off-by: Tim Emiola <adetokunbo@emio.la>
2ee47df to
94e18ca
Compare
meiersi-da
left a comment
There was a problem hiding this comment.
Thanks a lot. It looks like only some small polish is required.
| } | ||
|
|
||
| "return activity totals after aggregation trigger runs" in { implicit _env => | ||
| // TODO(#4118): Update the time management here |
There was a problem hiding this comment.
This issue is closed: #4118
Seems like the wrong reference.
| case None => | ||
| ScanResource.GetRewardAccountingActivityTotalsResponse.NotFound( | ||
| ErrorResponse( | ||
| s"Activity totals not yet computed for round $roundNumber" |
There was a problem hiding this comment.
| s"Activity totals not yet computed for round $roundNumber" | |
| s"Activity totals not (yet) computed for round $roundNumber" |
it might be never
|
|
||
| /** Trigger that drives the CIP-0104 reward computation pipeline via | ||
| * ScanAppRewardsStore.computeRewards, which will eventually run three | ||
| * idempotent steps in one transaction: |
There was a problem hiding this comment.
| * idempotent steps in one transaction: | |
| * computation steps in one transaction: |
| ): Future[Option[Long]] | ||
|
|
||
| /** Find the latest round for which all app activity records have been ingested. | ||
| * Returns None if fewer than two consecutive rounds have been ingested. |
There was a problem hiding this comment.
| * Returns None if fewer than two consecutive rounds have been ingested. |
Let's not leak implementation details on the API. We don't want callers to rely on it, and break when the implementation changes.
| * MUST only be called on rounds for which all app activity records have | ||
| * been ingested and for which the reward information has not yet been computed. | ||
| */ | ||
| def computeRewards(roundNumber: Long)(implicit |
There was a problem hiding this comment.
| def computeRewards(roundNumber: Long)(implicit | |
| def computeAndStoreRewards(roundNumber: Long)(implicit |
| required: | ||
| - round_number | ||
| - total_app_activity_weight | ||
| - active_parties_count |
There was a problem hiding this comment.
@rautenrieth-da : wdyt about also exposing the number of activity records in the round? Seems like very useful debugging info.
| def computeRewards( | ||
| roundNumber: Long | ||
| )(implicit tc: TraceContext): Future[Unit] = | ||
| aggregateActivityTotals(roundNumber) |
There was a problem hiding this comment.
@rautenrieth-da : wdyt about adding counters for:
- number of parties with activity
- number of activity records summarized
- number of parties with rewards
- number of batches created
tagged with synchronizer-id and migration-id.
It seems that having dashboards for these would seem valuable to observe the proper working of reward computation. However, I'm lacking the ops experience to judge their usefulness.
@adetokunbo : If we do so, then I'd still do that in a separate PR. Just create an issue from this comment.
| } yield () | ||
|
|
||
| /** Unnest per-verdict activity arrays and aggregate weights by party. */ | ||
| private def unnestAndAggregate(historyId: Long, roundNumber: Long) = |
There was a problem hiding this comment.
@rautenrieth-da : could I ask you to review these DB queries from a maintainers perspective? They look fine to me.
Fixes #4380
@meiersi-da: I replaced
NoOpActivityComputationwithFakeAppActivityComputationin 41b32f1. TheFakewill be replaced on this feature branch once its rebased on the nearly-done TCS changesPull Request Checklist
Cluster Testing
/cluster_teston this PR to request it, and ping someone with access to the DA-internal system to approve it./hdm_teston this PR to request it, and ping someone with access to the DA-internal system to approve it.PR Guidelines
Fixes #n, and mention issues worked on using#nMerge Guidelines