💥 Use Temporal Failures for Nexus Error Serialization#2773
💥 Use Temporal Failures for Nexus Error Serialization#2773Quinn-With-Two-Ns wants to merge 12 commits intotemporalio:masterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b2a0c78ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/common/NexusUtil.java
Outdated
Show resolved
Hide resolved
d69b39f to
3495885
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 349588549b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
| return new HandlerException(info.getType(), cause, retryBehavior); | ||
| if (failure | ||
| .getMessage() | ||
| .startsWith(String.format("handler error (%s)", info.getType()))) { |
There was a problem hiding this comment.
I understand this is here to allow proper decoding of legacy errors, but for which languages exactly? If we mean to support legacy errors coming from Go or others, are we sure the casing returned by .getType() here would match the error type of other SDKs?, or should we make this condition a little bit more resilient?
There was a problem hiding this comment.
Any old failure serialized by Go or Java. This is the handler error type so that should match the Nexus spec and consistent across Go and Java
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @SuppressWarnings("deprecation") | ||
| private void sendReply( |
There was a problem hiding this comment.
The structure of this function is very hard to reason about.
There was a problem hiding this comment.
If we do this, are you still concerned about the structure?
There was a problem hiding this comment.
That one is certainly the most patent culprit, but I'd also try to restructure this to avoid mixing happy paths and failure cases.
Like, instead of:
if (taskResponse != null) {
if (!supportTemporalFailure && taskResponse.getStartOperation().hasFailure()) {
// some failure case on old server
reuturn
}
// happy path AND some failure case on newer servers
}
// other failure cases
I'd suggest restructuring the function along the line of:
if (taskResponse != null && !taskResponse.getStartOperation().hasFailure()) {
// Operation succeeded
} else if (taskResponse != null) {
// Operation failed - OperationError
} else if (response.getHandlerException()) {
// Operation failed - HandlerError
} else {
// throw ...
}
There was a problem hiding this comment.
I'm not sure that will really be clearer since we use the same gRPC method to respond to the server ion the first and second case here.
There was a problem hiding this comment.
I have an attempt to refactor to make it clearer let me push and we can discuss
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
VegetarianOrc
left a comment
There was a problem hiding this comment.
LGTM, just a couple of minor things
temporal-sdk/src/main/java/io/temporal/internal/common/NexusUtil.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/common/NexusUtil.java
Outdated
Show resolved
Hide resolved
bergundy
left a comment
There was a problem hiding this comment.
Overall LGTM, left a few small comments.
temporal-sdk/src/main/java/io/temporal/failure/DefaultFailureConverter.java
Outdated
Show resolved
Hide resolved
temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java
Outdated
Show resolved
Hide resolved
temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java
Show resolved
Hide resolved
6d50161 to
750f3a2
Compare
|
Just released the new Nexus Java SDK, takes a bit to propagate |
| } | ||
|
|
||
| // Create a copy without the message before serializing | ||
| FailureInfo failureCopy = FailureInfo.newBuilder(failureInfo).setMessage("").build(); |
There was a problem hiding this comment.
Stack trace not cleared in nexus failure metadata payload
Low Severity
nexusFailureMetadataToPayloads creates a FailureInfo copy that only clears the message via setMessage("") before serializing into the details payload, but does not also clear the stack trace. The sibling methods temporalFailureToNexusFailure and temporalFailureToNexusFailureInfo both strip message and stack trace from details. This inconsistency was also noted in review — Go and Python clear both.


💥 Use Temporal Failures for Nexus Error Serialization. The Java SDK now responds to Nexus operation failures and task failures with plain Temporal Failures. This change also allows
OperationExceptionandHandlerExceptionto have their own message and stacktrace independent of their cause. If the Server does not support the new format the SDK converts back to the old format before sending the response.Requires:
Note
Medium Risk
Behavior changes in how Nexus failures are serialized/deserialized and sent to the server; while guarded by capability checks and extensive tests, regressions could affect Nexus error propagation and metrics tagging across server versions.
Overview
Nexus task failure reporting is switched to the new wire format that returns plain Temporal
Failureprotos for both operation failures (inStartOperationResponse) and handler failures (inRespondNexusTaskFailedRequest), enabling independent message/stacktrace onOperationException/HandlerExceptionand preserving stack traces through conversions.Adds compatibility shims: SDK detects server support via request capabilities and can down-convert to legacy
UnsuccessfulOperationError/HandlerError(also forceable viatemporal.nexus.forceOldFailureFormat), while the in-memory test server is updated to advertise/support the new format and still accept the old one. Includes updated/expanded tests and bumpsnexusdependency + CI Temporal CLI version.Written by Cursor Bugbot for commit 62f038b. This will update automatically on new commits. Configure here.