-
Notifications
You must be signed in to change notification settings - Fork 158
fix(re-create): force rescan if inbound vote monitoring fails using a context that can timeout #4278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughIntroduces structured monitoring via ErrTxMonitor and a monitorErrCh wired through PostVoteInbound and MonitorVoteInboundResult. Adds observer logic to handle monitor errors and optionally force-reset last-scanned blocks. Updates WithLastBlockScanned signature and usages. Adds context CopyWithTimeout utility. Adjusts tests and mocks accordingly. Minor range-calculation helper and changelog entry. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Obs as Observer
participant ZC as ZetacoreClient
participant Mon as Monitor Goroutine
participant H as handleMonitoringError
rect rgb(245,250,255)
note over Obs: Post vote inbound
Obs->>ZC: PostVoteInbound(ctx, gas, retryGas, msg, monitorErrCh)
activate ZC
ZC-->>Obs: (zetaTxHash, ballotIndex)
end
par Start monitoring
ZC->>Mon: MonitorVoteInboundResult(ctx, zetaTxHash, retryGas, msg, monitorErrCh)
activate Mon
alt success
Mon-->>ZC: nil
Mon-->>monitorErrCh: ErrTxMonitor{Err:nil}
else error/timeout
Mon-->>monitorErrCh: ErrTxMonitor{Err, InboundBlockHeight, ZetaTxHash, BallotIndex}
end
deactivate Mon
and Handle monitor result
Obs->>H: handleMonitoringError(monitorErrCh)
activate H
alt ErrTxMonitor with InboundBlockHeight>0
H->>Obs: WithLastBlockScanned(height, true)
note right of Obs: Set forceResetLastScanned
else no error or no height
H-->>Obs: no state change
end
deactivate H
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #4278 +/- ##
===========================================
- Coverage 66.05% 65.98% -0.08%
===========================================
Files 454 455 +1
Lines 33566 33649 +83
===========================================
+ Hits 22172 22203 +31
- Misses 10433 10481 +48
- Partials 961 965 +4
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
zetaclient/context/context.go (1)
43-50: Consider validating the timeout parameter.The function does not validate whether
timeoutis positive. Whilegoctx.WithTimeouthandles non-positive durations by creating an already-expired context, explicitly documenting or validating this behavior would improve clarity and prevent subtle bugs.zetaclient/context/context_test.go (2)
79-87: Timing assertions may be flaky in slow CI environments.The upper bound assertion
elapsed < timeout*2(line 86) may fail in resource-constrained CI runners where goroutine scheduling can be delayed. Consider using a more generous upper bound (e.g.,timeout + 1*time.Second) or leveraging test utilities that account for execution environment variability.
60-88: Add test coverage for the error path.The test verifies the happy path where
AppContextis present, but does not cover the scenario whereFromContextfails (i.e., when the source context lacksAppContext). This path should be tested to ensure the function still returns a valid timeout context and cancel function.Add a test case similar to:
func TestCopyWithTimeout_NoAppContext(t *testing.T) { // ARRANGE ctx1 := goctx.Background() // no AppContext timeout := 100 * time.Millisecond // ACT ctx2, cancel := context.CopyWithTimeout(ctx1, goctx.Background(), timeout) defer cancel() // ASSERT // Verify that AppContext is not present _, err := context.FromContext(ctx2) assert.ErrorIs(t, err, context.ErrNotSet) // Verify that timeout still works <-ctx2.Done() assert.ErrorIs(t, ctx2.Err(), goctx.DeadlineExceeded) }pkg/errors/monitor_error.go (1)
5-19: Review the nilErrpattern for semantic clarity.The
Error()method returns"monitoring completed without error"whenErris nil, which is semantically unusual—this type is being used to signal both success and failure on the monitoring channel. Consider whether:
- The channel should send
nilto signal success (idiomatic Go) andErrTxMonitoronly for errors, OR- A separate success signal type is warranted if additional metadata (block height, tx hash, ballot index) must accompany successful completion.
As implemented, calling
.Error()on a success case produces an error-like string, which may confuse logging and error handling downstream.If the current design is intentional and success metadata is required, document this dual-purpose behavior clearly in a comment above the type definition. Otherwise, refactor to separate success/failure signaling:
+// ErrTxMonitor represents an error from the monitoring goroutine. +// When Err is nil, monitoring completed successfully but metadata is preserved. +// Callers should check if Err is nil before treating this as an error. type ErrTxMonitor struct { Err error InboundBlockHeight uint64 ZetaTxHash string BallotIndex string }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (21)
changelog.md(1 hunks)pkg/errors/monitor_error.go(1 hunks)zetaclient/chains/base/confirmation.go(1 hunks)zetaclient/chains/base/confirmation_test.go(2 hunks)zetaclient/chains/base/observer.go(11 hunks)zetaclient/chains/base/observer_test.go(1 hunks)zetaclient/chains/bitcoin/observer/db.go(1 hunks)zetaclient/chains/bitcoin/observer/db_test.go(1 hunks)zetaclient/chains/evm/observer/observer.go(1 hunks)zetaclient/chains/evm/observer/observer_test.go(1 hunks)zetaclient/chains/interfaces/zetacore.go(2 hunks)zetaclient/chains/solana/observer/inbound.go(1 hunks)zetaclient/chains/sui/observer/observer_test.go(2 hunks)zetaclient/chains/ton/observer/observer_test.go(1 hunks)zetaclient/context/context.go(2 hunks)zetaclient/context/context_test.go(2 hunks)zetaclient/testutils/mocks/zetacore_client.go(3 hunks)zetaclient/testutils/mocks/zetacore_client_opts.go(1 hunks)zetaclient/zetacore/client_monitor.go(6 hunks)zetaclient/zetacore/client_vote.go(3 hunks)zetaclient/zetacore/tx_test.go(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
⚙️ CodeRabbit configuration file
Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.
Files:
zetaclient/chains/solana/observer/inbound.gozetaclient/chains/base/observer_test.gozetaclient/chains/ton/observer/observer_test.gozetaclient/context/context_test.gozetaclient/chains/evm/observer/observer_test.gozetaclient/chains/bitcoin/observer/db_test.gozetaclient/chains/evm/observer/observer.gozetaclient/context/context.gozetaclient/chains/base/confirmation_test.gozetaclient/testutils/mocks/zetacore_client_opts.gozetaclient/chains/interfaces/zetacore.gozetaclient/zetacore/tx_test.gozetaclient/chains/sui/observer/observer_test.gozetaclient/testutils/mocks/zetacore_client.gopkg/errors/monitor_error.gozetaclient/chains/base/confirmation.gozetaclient/zetacore/client_monitor.gozetaclient/zetacore/client_vote.gozetaclient/chains/bitcoin/observer/db.gozetaclient/chains/base/observer.go
🧬 Code graph analysis (10)
zetaclient/chains/base/observer_test.go (1)
zetaclient/chains/base/observer.go (1)
Observer(45-87)
zetaclient/context/context_test.go (3)
zetaclient/context/app.go (1)
New(38-45)zetaclient/config/config_chain.go (1)
New(15-32)zetaclient/context/context.go (3)
WithAppContext(15-17)CopyWithTimeout(43-50)FromContext(20-27)
zetaclient/chains/base/confirmation_test.go (1)
zetaclient/chains/base/observer.go (1)
Observer(45-87)
zetaclient/chains/interfaces/zetacore.go (1)
pkg/errors/monitor_error.go (1)
ErrTxMonitor(6-11)
zetaclient/chains/sui/observer/observer_test.go (1)
pkg/errors/monitor_error.go (1)
ErrTxMonitor(6-11)
zetaclient/testutils/mocks/zetacore_client.go (1)
pkg/errors/monitor_error.go (1)
ErrTxMonitor(6-11)
zetaclient/zetacore/client_monitor.go (3)
pkg/errors/monitor_error.go (1)
ErrTxMonitor(6-11)pkg/retry/retry.go (1)
Retry(126-136)zetaclient/logs/fields.go (1)
FieldZetaTx(25-25)
zetaclient/zetacore/client_vote.go (1)
pkg/errors/monitor_error.go (1)
ErrTxMonitor(6-11)
zetaclient/chains/bitcoin/observer/db.go (2)
pkg/chains/bitcoin.go (1)
IsBitcoinRegnet(46-48)zetaclient/chains/bitcoin/observer/observer.go (1)
RegnetStartBlock(69-69)
zetaclient/chains/base/observer.go (9)
zetaclient/chains/evm/observer/observer.go (1)
Observer(59-73)zetaclient/chains/bitcoin/observer/observer.go (1)
Observer(94-132)zetaclient/chains/solana/observer/observer.go (1)
Observer(70-87)zetaclient/chains/sui/observer/observer.go (1)
Observer(19-32)zetaclient/chains/ton/observer/observer.go (1)
Observer(20-30)pkg/errors/monitor_error.go (1)
ErrTxMonitor(6-11)zetaclient/context/context.go (1)
CopyWithTimeout(43-50)zetaclient/chains/interfaces/zetacore.go (1)
ZetacoreClient(66-124)zetaclient/logs/fields.go (3)
FieldZetaTx(25-25)FieldBallotIndex(35-35)FieldBlock(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build-zetanode
- GitHub Check: analyze (go)
- GitHub Check: build
- GitHub Check: gosec
- GitHub Check: lint
- GitHub Check: build-and-test
🔇 Additional comments (22)
zetaclient/context/context.go (1)
44-47: Return the cancel function in the error path.The error path returns only the context from
goctx.WithTimeout, discarding the cancel function. This creates inconsistent return behavior and prevents callers from properly releasing resources associated with the timeout context.Apply this diff to return both values:
func CopyWithTimeout(from, to goctx.Context, timeout time.Duration) (goctx.Context, goctx.CancelFunc) { app, err := FromContext(from) if err != nil { - return goctx.WithTimeout(to, timeout) + ctxWithTimeout, cancel := goctx.WithTimeout(to, timeout) + return ctxWithTimeout, cancel } ctxWithTimeout, cancel := goctx.WithTimeout(to, timeout) return WithAppContext(ctxWithTimeout, app), cancel }Likely an incorrect or invalid review comment.
zetaclient/chains/base/confirmation_test.go (3)
70-79: LGTM! Test case validates scan range for reset scenario.The new test case correctly validates the scan range calculation when
lastScannedsignificantly lags behindlastBlock(50 vs 100), ensuring the observer handles forced resets appropriately.
86-86: LGTM! API signature updated correctly.The call to
WithLastBlockScannednow includes the second boolean parameter (false), aligning with the API evolution across the codebase.
144-144: LGTM! Consistent API usage.The signature update is consistent with the broader API change to
WithLastBlockScanned(lastScanned uint64, includeInBatch bool).zetaclient/chains/ton/observer/observer_test.go (1)
247-247: LGTM! Mock signature updated for monitoring channel.The mock expectation correctly accommodates the additional
monitorErrChparameter introduced toPostVoteInbound, enabling error monitoring for inbound vote processing.zetaclient/chains/solana/observer/inbound.go (1)
57-57: LGTM! API signature updated correctly.The call to
WithLastBlockScannedincludes the second boolean parameter (false), properly updating the last scanned block for metrics when no new signatures are found and the slot query succeeds.zetaclient/chains/bitcoin/observer/db_test.go (1)
99-99: LGTM! Test correctly resets state with updated API.The call to
WithLastBlockScanned(0, false)properly resets the last scanned block to trigger the RPC code path, enabling validation of error handling.zetaclient/testutils/mocks/zetacore_client_opts.go (1)
50-50: LGTM! Mock helper updated for monitoring channel.The
WithPostVoteInboundhelper now correctly mocks the extendedPostVoteInboundsignature with five parameters, including themonitorErrChchannel for vote monitoring.zetaclient/chains/evm/observer/observer_test.go (1)
241-241: LGTM: Test correctly updated for new signature.The addition of the
falseparameter toWithLastBlockScannedaligns with the broader API change. In this test context, where the last block is intentionally reset to 0 to trigger RPC loading, passingfalse(presumably meaning "don't skip") is the correct behavior.zetaclient/zetacore/tx_test.go (2)
240-245: LGTM: Test updated correctly for new signature.The addition of
nilfor themonitorErrChparameter is appropriate for this unit test, which focuses on the "already voted" scenario and doesn't require monitoring channel verification.
282-287: LGTM: Test updated correctly for new signature.Passing
nilfor themonitorErrChparameter is appropriate here, as this test validates the basic monitoring flow without requiring error channel assertions.zetaclient/chains/interfaces/zetacore.go (2)
13-13: LGTM: Import alias added appropriately.The
zetaerrorsalias avoids collision with the standarderrorspackage and clearly identifies the custom error types.
49-54: LGTM: Interface signature extended correctly.The addition of
monitorErrCh chan<- zetaerrors.ErrTxMonitortoPostVoteInboundis well-typed (send-only channel) and consistently propagated across implementations and mocks according to the PR context.zetaclient/chains/sui/observer/observer_test.go (2)
18-18: LGTM: Import alias added appropriately.Consistent with the pattern used in other test files to avoid collision with the standard
errorspackage.
605-615: LGTM: Test mock updated correctly for new signature.The
CatchInboundVotestest helper correctly:
- Adds the
monitorErrChparameter to the callback signature (unused via_, which is appropriate for this test)- Updates the mock expectation to match the new 5-parameter signature
The test continues to capture inbound votes as intended without requiring monitoring channel verification.
zetaclient/chains/base/observer_test.go (1)
196-196: LGTM: Test correctly updated for new API signature.The addition of the
falseparameter aligns with the updatedWithLastBlockScannedsignature across the codebase.zetaclient/chains/base/confirmation.go (1)
98-98: LGTM: Example documentation accurately illustrates edge case.The added example correctly demonstrates the scenario where
lastScannedis significantly behindlastBlock, producing 41 unscanned blocks in the range [51, 91]. The parenthetical note provides helpful context about monitoring-driven resets.zetaclient/chains/evm/observer/observer.go (1)
264-264: LGTM: Correct parameter added for initial block setup.The
falseparameter appropriately indicates this is a normal initialization path rather than a forced reset scenario.zetaclient/chains/bitcoin/observer/db.go (2)
50-50: LGTM: API signature correctly updated.The
falseparameter is appropriate for this initialization path.
55-55: LGTM: Regtest initialization correctly updated.Consistent usage of the
falseparameter for the regtest-specific initialization.zetaclient/zetacore/client_vote.go (2)
159-159: LGTM: Structured error monitoring added.The new
monitorErrChparameter enables structured error propagation from the monitoring goroutine, improving observability of vote monitoring failures.
193-210: Revise log and reevaluate monitoring context
- In client_vote.go’s goroutine select, change
to report the actual cancellation reason.- case <-ctx.Done(): - c.logger.Error().Msg("context cancelled: timeout") + case <-ctx.Done(): + c.logger.Error().Err(ctx.Err()).Msg("context cancelled: failed to send monitor error")- Confirm whether the monitor goroutine should share the parent
ctx(and be cancelled when the request ends) or use a separate context that outlives the caller to avoid dropping monitoring errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm think about changing our approach for this one.
Initially we were in a context where we wanted a quick patch for the missed inbound issue.
This rescan approach was the simplest implementation for it.
But this has drawback, like rescanning blocks that you already know will not contains event and blocking the scanning process if for some reason one inbound can't be processed.
In the end I think the best approach would be to keep the current workflow for the block scanning and report the events that failed to be observed in a internal tracker cache.
We have a separated workflow to iterate these missed observation, the same as the current behavior with the tracker lists.
This way we ensure that most ZetaClient will remain at the same pace of block observation and one is not falling behind.
I also think the previous solution could be improved. Creating |
|
Close this PR, it's replaced by #4295 |
Description
This PR reintroduces the following changes with additional enhancement
Closes #4245
How Has This Been Tested?
Note
Force a rescan when inbound vote monitoring fails by propagating errors via a channel and resetting last scanned height, using a timeout-bound context; update interfaces, observers, and tests accordingly.
ErrTxMonitorand monitor error channel to propagate monitoring failures.Observer.PostVoteInbound, passmonitorErrChand spawn handlerhandleMonitoringError(...)to log andForceSaveLastBlockScanned(inboundHeight-1)to trigger rescan.MonitoringErrHandlerRoutineTimeoutand usezctx.CopyWithTimeoutto bound monitoring duration.forceResetLastScannedstate; changeWithLastBlockScanned(block, forceReset)signature and behavior to avoid overwriting when reset is pending.ForceSaveLastBlockScannedto persist lower heights and enforce rescan.PostVoteInbound(ctx, gasLimit, retryGasLimit, msg, monitorErrCh); monitoring goroutine uses same ctx and reports errors via channel.MonitorVoteInboundResultacceptsmonitorErrChand resends on OOG; improved logging.CopyWithTimeout(from, to, timeout)plus tests.ZetacoreWriter/PostVoteInboundsignature; regenerate/update mocks and test helpers.CopyWithTimeouttests.Written by Cursor Bugbot for commit ca462b5. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation