-
Notifications
You must be signed in to change notification settings - Fork 78
[DO NOT MERGE] Add standalone activity APIs #640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
fe10e59 to
6343d13
Compare
| ACTIVITY_EXECUTION_STATUS_UNSPECIFIED = 0; | ||
| // The activity is not in a terminal status. This does not necessarily mean that there is a currently running | ||
| // attempt. The activity may be backing off between attempts or waiting for a worker to pick it up. | ||
| ACTIVITY_EXECUTION_STATUS_RUNNING = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will likely add a PAUSED status here since this is the direction workflow pause is going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think either are a wise direction. What is happening between "running" and "done" are not traditionally execution statuses. Are you going to have a "backing off" status too? Execution status has traditionally been "running or done", and this changes those expectations unnecessarily. Can you use a separate enum to provide more detailed information about the state the running activity/workflow is in and leave the execution status alone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what we are doing for workflow pause, we should be consistent for any type of execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I am saying we shouldn't do it for workflow pause either (and have mentioned it in those situations too). It's the same reason that "backing off" doesn't need to be a status. Sure it is effectively paused during backoff, but that doesn't affect that Temporal defines activities and workflows as running if they are not completed.
There are many many cases where the expectations would be violated. For instance, people expect ID reuse policy to only apply to not-running workflows, and ID conflict policy to only apply to running workflows. This violates both of those. People expect ExecutionStatus != Running query to give them all completed workflows. I could come up with many more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd really nice to be able to see paused executions and their counts in the main list view. I don't think "backing off" is the same as pause, we don't update visibility and don't have enough of an incentive to do that for intermediate transient states.
In any case, we shouldn't be having this debate here. Whatever we decide to do for workflows we will copy to activities. For now, we are not even implementing standalone activity pause. That will likely come in the second phase of this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd really nice to be able to see paused executions and their counts in the main list view
Sure, but that doesn't mean affecting existing fields/concepts like status that have a clear defined expectation and always have
In any case, we shouldn't be having this debate here
👍
6343d13 to
3fcf444
Compare
3fcf444 to
0b0a6bf
Compare
| // When StartActivityExecution uses the ID_CONFLICT_POLICY_USE_EXISTING and there is already an existing running | ||
| // activity, OnConflictOptions defines actions to be taken on the existing running activity, updating its state. | ||
| message OnConflictOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be ok not doing this for the first version. If we are to keep this, the comments above the fields need to be updated to not refer to workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this will only be added when we do the Nexus integration. I put it here for completeness.
| } | ||
|
|
||
| // Limited activity information returned in the list response. | ||
| message ActivityListInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with workflow, this one should be called ActivityExecutionInfo and we can add ActivityExecutionExtendedInfo or something if you need things that are not in visibility. I cover this in a later comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was not done correctly for workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, and even if I did agree, there's no harm in being consistent here and making list a subset of describe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the consistency argument is that strong here. We have seen multiple times that we assume something needs to only be in describe or list and realize we actually want it in both. The two separate messages don't allow for repurposing the fields without deprecation or duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that ActivityExecutionListInfo is a more consistent name for this. The underlying entity in both cases is an activity execution; it's the amount of info that differs. Thus
ActivityExecutionInfo: full info for an activity execution, returned bydescribeActivityExecutionListInfo: reduced info for an activity execution, returned bylist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, list should be a subset of describe. I assume the omission of task queue is an oversight.
Task queue is in activity options.
And maybe the next developer doing a future CHASM archetype decides to be inconsistent in their way that they think represents the future and has plenty of justification at that time.
That's why we are having these reviews to agree on a direction. Being consistent with a single API, for the sake of consistency, is not a strong argument IMHO. We have a chance to make things better, which Dan and I believe this is achieving.
This is too small of a concern to completely upend model reuse for everyone. This can be used as justification to never use a model in more than one place. We could just inline/copy all fields everywhere.
Why is it too small of a concern? There are definitely places where this doesn't apply, nobody says we need to take this guideline as a strict rule, it should be applied as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution here (i.e. matching latest PR state, which is pending SDK approval) is that the API will use independent response structs:
// Information about a standalone activity.
message ActivityExecutionInfo {
// Limited activity information returned in the list response.
message ActivityExecutionListInfo {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify whether you resolve to make sure that everything that is in list is also in non-list? Anything beyond relying on close PR scrutiny to ensure that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view is that all the information in list should be in non-list. However, it may not be a simple subset, as the example of task_queue showed (its present in non-list as part of ActivityOptions). I've added the following comment:
// When adding fields here, ensure that it is also present in ActivityExecutionInfo (note that it
// may already be present in ActivityExecutionInfo but not at the top-level).So there's that that, and PR scrutiny.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server could have a test that verifies that every field in list-info can be computed from info. It wouldn't stop PRs here merging but it would ensure the server is not released violating this.
| message DescribeActivityExecutionResponse { | ||
| temporal.api.activity.v1.ActivityExecutionInfo info = 1; | ||
|
|
||
| // A token that can be passed in via a subsequent `DescribeActivityExecutionRequest` to long poll on the activity | ||
| // state as it makes progress. | ||
| bytes long_poll_token = 2; | ||
| } | ||
|
|
||
| message ListActivityExecutionsRequest { | ||
| string namespace = 1; | ||
| // Max number of executions to return per page. | ||
| int32 page_size = 2; | ||
| // Token returned in ListActivityExecutionsResponse. | ||
| bytes next_page_token = 3; | ||
| // Visibility query, see https://docs.temporal.io/list-filter for the syntax. | ||
| string query = 4; | ||
| } | ||
|
|
||
| message ListActivityExecutionsResponse { | ||
| repeated temporal.api.activity.v1.ActivityListInfo executions = 1; | ||
| // Token to use to fetch the next page. If empty, there is no next page. | ||
| bytes next_page_token = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match workflow, describe should be a superset of list response, not a completely different structure. This makes sense with many of our SDKs where describe returns a class that extends what list returns. IMO this should be (ignoring long poll stuff):
message DescribeActivityExecutionResponse {
temporal.api.activity.v1.ActivityExecutionInfo info = 1;
temporal.api.activity.v1.ActivityExecutionExtendedInfo extended_info = 2;
// ...
}
// ...
message ListActivityExecutionsResponse {
repeated temporal.api.activity.v1.ActivityExecutionInfo executions = 1;
// ...If there are some things that don't feel like they overlap, e.g. task queue or other activity options on the shared info only represent what it was started with, then those fields can be named/documented appropriately. It makes no sense to get more information from list. Describe should always have at least what visibility does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the consistency here is important TBH. Using the same info types for workflow list and describe was an oversight that we recognize today. I would be open to changing this if message reuse simplifies anything significantly on the SDK side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it was an oversight. Do we believe that describe data is a superset of the list data? If so, we need to model it that way. The SDKs (that have types for list and describe) are modeled that way very intentionally (via inheritance or similar) because it makes perfect sense that visibility is a subset of the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented on this above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we at least agree that describe data is a superset of the list data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we at least agree that describe data is a superset of the list data?
I agree with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution here (i.e. matching latest PR state, which is pending SDK approval) is that the API will use independent response structs:
// Information about a standalone activity.
message ActivityExecutionInfo {
// Limited activity information returned in the list response.
message ActivityExecutionListInfo {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that.
The model already violates this. For instance, we have decided for list response we are giving a field for execution_duration. Same for close_time too unless I'm missing where these two fields are defined.
How are you planning to uphold this assurance in the absence of normal model reuse? My concern is the only way it will be upheld is if I personally review every single field very closely. I think we can expect for this to happen plenty if it's happening already before the first PR lands (again, unsure, maybe it's in a model I missed somewhere, hard to know without proper model reuse).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you planning to uphold this assurance in the absence of normal model reuse? My concern is the only way it will be upheld is if I personally review every single field very closely.
The server could have a test that verifies that every field in list-info can be computed from info.
I think we can expect for this to happen plenty if it's happening already before the first PR lands
(Actually, I'd say this PR will be subject to more stringent validation when we start implementing with it than now when we're just eyeballing it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test server side would make me much more comfortable, thanks
| ACTIVITY_EXECUTION_STATUS_UNSPECIFIED = 0; | ||
| // The activity is not in a terminal status. This does not necessarily mean that there is a currently running | ||
| // attempt. The activity may be backing off between attempts or waiting for a worker to pick it up. | ||
| ACTIVITY_EXECUTION_STATUS_RUNNING = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think either are a wise direction. What is happening between "running" and "done" are not traditionally execution statuses. Are you going to have a "backing off" status too? Execution status has traditionally been "running or done", and this changes those expectations unnecessarily. Can you use a separate enum to provide more detailed information about the state the running activity/workflow is in and leave the execution status alone?
temporal/api/enums/v1/id.proto
Outdated
| // If the request is denied, the server returns an `ExecutionAlreadyStarted` error. | ||
| // | ||
| // See `IdConflictPolicy` for handling ID duplication with a *running* execution. | ||
| enum IdReusePolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem general purpose, it seems each top-level Temporal component has its own ID reuse/conflict policy enumerates. Shouldn't this be ActivityIdReusePolicy and ActivityIdConflictPolicy? I think this actually may have value if/when we need something different for one component or the other. From an SDK POV, I would expect these to be separately named enums too because we like consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already know that I want this for "standalone" Nexus operations, and generally these reuse and conflict policies are framework capabilities. I am already at the point of wanting to generalize this and not having to redefine for every new primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 3 sets of enums here for their distinct uses is better than being inconsistent (even if they happen to have the same values so far). How do you expect the SDKs to model this? To be inconsistent with themselves, or are SDKs held to a higher standard of consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to get tired of redefining these enums everywhere, and for the most part, SDKs can alias types for the workflow flavors of these enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already redefining them in other user-facing places. SDKs can not always alias "workflow flavors", not every language works like that. It is important to see this on the whole and understand all user interfaces, not assume what user-facing aspects can do. We're not going to get tired of these two enums on a situational basis any more than we are with cancellation types or other things where we have done this. The SDKs will at least remain as consistent as possible, even if the API sadly continues to choose not to.
I am now growing convinced we need to see user-facing SDK designs before we can understand the APIs needed for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now leaning towards having a specific one for activities just to avoid mis-predicting the future where we would want to allow different archetypes to have different ID reuse and conflict policies. It's worth the duplication IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #650 targeting this PR switching to activity-specific reuse and conflict policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution here (i.e. matching latest PR state, which is pending SDK approval) is that the API will use activity-specific reuse and conflict policies, as opposed to policies intended to be shared between different execution types.
| // This token is returned as part of the `DescribeActivityExecutionResponse`. | ||
| bytes long_poll_token = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One wonders if there is value in "activity event" ledger. People are going to long poll and wonder why they can't get updated for each attempt that has an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this and decided not to add it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the activity state changes
Let's define "activity state" in the comment here. (What is the intended definition here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution here (i.e. matching latest PR state, which is pending SDK approval) is that server is only going to return latest state; it will not persist a ledger/history. Added documentation:
// returning the new state. Note that activity state may change multiple times between
// requests, therefore it is not guaranteed that a client making a sequence of long-poll
// requests will see a complete sequence of state changes.
| message CountActivityExecutionsRequest { | ||
| string namespace = 1; | ||
| // Visibility query, see https://docs.temporal.io/list-filter for the syntax. | ||
| string query = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the built-in search attributes for activities still TBD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully fleshed out yet. I believe this will be the rough set of attributes:
ActivityId
RunId
ActivityType
TaskQueue
StartTime
ExecutionTime
CloseTime
ExecutionStatus
ExecutionDuration
PauseInfo -- when we add pause support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also adding StateTransitionCount. I think that may be the list we end up going with.
| // If set, turns this request into a long poll that is unblocked when the activity reaches a terminal status. | ||
| // The wait duration is capped by the request's context deadline or by the maximum enforced long poll interval | ||
| // allowed by the server. | ||
| bool wait = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing from a UI/CLI POV. So if I want to wait on an update for anything but result I have to use one call, and to wait on result, I have to use this call separately? For workflows, interfaces can use the same call to wait on any kind of update, whether its result or events or whatever. I think we should consider doing the same for activities. I can give some suggestions on ideas if we don't want a full activity ledger, but many boil down to having one RPC that can opt-in to whether it cares about only result or any update, and that doesn't have to have been called first before it can be long polled. This can be mashed into describe if you must, but I like describe to be a fast synchronous call for all components.
I recommend something like:
message GetActivityExecutionStatusRequest {
string namespace = 1;
string activity_id = 2;
string run_id = 3;
bool include_info = 4;
bool wait_info = 5; // Cannot be true if include_info is false
bytes continue_info_wait_token = 6; // Cannot have a value if either of the above two are false
bool include_outcome = 7;
bool wait_outcome = 8; // Cannot be true if include_outcome is false
}
message GetActivityExecutionStatusRequest {
// These three only set if include_info is true (and if it's true, they are _always_ all set)
temporal.api.activity.v1.ActivityExecutionInfo current_info = 1;
temporal.api.activity.v1.ActivityExecutionExtendedInfo current_extended_info = 2;
bytes continue_info_wait_token = 3;
// This is only set if is activity completed and include_outcome is true
oneof result {
temporal.api.common.v1.Payloads result = 4;
temporal.api.failure.v1.Failure failure = 5;
}
}This is important to reduce interface polling calls that need to know both of these things. I could see an SDK however splitting these into separate calls. It also allows us to keep describe as a simple synchronous getter like it is everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. Let me take a stab at it. This just means that we won't have Describe at all but I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should absolutely have describe at least from a user POV. I understand the API is not held to a very high standard of consistency/clarity, but our SDKs are, so I guess we can reuse this API call and hide our unfortunate lack of API consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't see a good enough reason to duplicate the APIs apart from the consistency argument, which I don't think is super important. Workflows have both history and mutable state, activities only have mutable state. You don't need two APIs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't see a good enough reason to duplicate the APIs apart from the consistency argument
If you discount the good enough reason, then yes it will be hard to see a good enough reason. I was referring to the SDK at least being consisten.
I am now convinced this API cannot be designed well without designing the SDK APIs that will use it. SDK will have a synchronous describe and it sounds like API will not, so we need to see how it fits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the proposal here -- that the API should have a single endpoint with the option for long-poll for various stages including completed, thus allowing result to be retrieved and / or other status info ("describe"). The SDKs can present separate APIs if they choose to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution here (i.e. matching latest PR state, which is pending SDK approval) is that the API will offer a method named PollActivityExecutionRequest to poll for state and/or result, optionally including input, with no-wait and long-poll variants. The name and functionality aims for clarity and increased consistency (with WorkflowExecutionUpdate specifically):
- "Poll" is accurate: the method offers two forms of poll and long-poll
- The functionality is similar to
PollWorkflowExecutionUpdateand the name thus emphasizes this consistency. - It feels better IMO to use "poll" to wait for a result than to use "describe" for that.
- It makes sense IMO that an SDK would implement "describe" and "get result" in terms of a lower-level "poll" API
| message RequestCancelActivityExecutionResponse { | ||
| } | ||
|
|
||
| message TerminateActivityExecutionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're a bit inconsistent on whether our side-effecting calls need a request ID for dedupe. I understand that a terminate is final and therefore a successive call is "safe", but, a user could see an error for the second terminate call I assume. We do not need to solve in this PR since it's a general problem, but IMO we should see if we can make a policy of "request ID on all side-effecting calls" though I understand it causes server side effort/persistence to dedupe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from workflow terminate, I wonder if it's worth having consistency there or not. It's not much trouble to store the terminate request ID and return an error if the activity is already terminated if the request ID does not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can tackle the "idempotency keys for all side-effecting requests" requirement/project separately, but would definitely be nice to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of other requests that we would want to do this with? Probably pause, unpause, update options...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything side-effecting tbh (which includes things that have no user-facing side effects, like queries, because they technically do things), but can be opened as a separate issue
| message RequestCancelActivityExecutionResponse { | ||
| } | ||
|
|
||
| message TerminateActivityExecutionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to note that it is a bit confusing seeing PauseActivityExecution work for multiple forms of activity, but TerminateActivityExecution (and cancel and get result and describe and others) only work for one kind of activity. In the SDK we're going to have to be careful to structure it in a way that is less inconsistent than our API has chosen to structure it here. I do wish we could be clearer in the naming and/or have a pathway to really be generic in these concepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional. We don't want to add these APIs for workflow activities yet but they are structured in a way that it would be possible to add workflow_id later to operate on a workflow activity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yet
If there are plans to, that makes sense then at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inconclusive, but I didn't think we'd need to support describe for workflow activities initially either. It turns out that if we have list, we need describe. Just don't want to get into a corner where it's harder to add it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @cretz!
temporal/api/enums/v1/id.proto
Outdated
| // If the request is denied, the server returns an `ExecutionAlreadyStarted` error. | ||
| // | ||
| // See `IdConflictPolicy` for handling ID duplication with a *running* execution. | ||
| enum IdReusePolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already know that I want this for "standalone" Nexus operations, and generally these reuse and conflict policies are framework capabilities. I am already at the point of wanting to generalize this and not having to redefine for every new primitive.
| // Callbacks to be called by the server when this activity reaches a terminal status. | ||
| // Callback addresses must be whitelisted in the server's dynamic configuration. | ||
| repeated temporal.api.common.v1.Callback completion_callbacks = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I put it here for completeness. It won't make it into the first release.
| // Callbacks to be called by the server when this activity reaches a terminal status. | ||
| // Callback addresses must be whitelisted in the server's dynamic configuration. | ||
| repeated temporal.api.common.v1.Callback completion_callbacks = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove these before merging.
| // activity task to be eagerly executed. | ||
| // The caller is expected to have a worker available to process the task. | ||
| PollActivityTaskQueueResponse eager_task = 3; | ||
| // Link to the workflow event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I will review the comments. Thanks for catching it.
| // If true, the activity input is returned in the response. | ||
| bool include_input = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because there is no other way of getting updates when the activity is running, unlike workflow where you can wait for history events. But I 100% agree that we want this for workflows too, we're just focusing on standalone activities right now.
| // This token is returned as part of the `DescribeActivityExecutionResponse`. | ||
| bytes long_poll_token = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this and decided not to add it for now.
| message CountActivityExecutionsRequest { | ||
| string namespace = 1; | ||
| // Visibility query, see https://docs.temporal.io/list-filter for the syntax. | ||
| string query = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully fleshed out yet. I believe this will be the rough set of attributes:
ActivityId
RunId
ActivityType
TaskQueue
StartTime
ExecutionTime
CloseTime
ExecutionStatus
ExecutionDuration
PauseInfo -- when we add pause support
| // If set, turns this request into a long poll that is unblocked when the activity reaches a terminal status. | ||
| // The wait duration is capped by the request's context deadline or by the maximum enforced long poll interval | ||
| // allowed by the server. | ||
| bool wait = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. Let me take a stab at it. This just means that we won't have Describe at all but I'm fine with that.
| message RequestCancelActivityExecutionResponse { | ||
| } | ||
|
|
||
| message TerminateActivityExecutionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from workflow terminate, I wonder if it's worth having consistency there or not. It's not much trouble to store the terminate request ID and return an error if the activity is already terminated if the request ID does not match.
| message RequestCancelActivityExecutionResponse { | ||
| } | ||
|
|
||
| message TerminateActivityExecutionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional. We don't want to add these APIs for workflow activities yet but they are structured in a way that it would be possible to add workflow_id later to operate on a workflow activity.
| temporal.api.common.v1.Payloads result = 2; | ||
| // The failure if the activity completed unsuccessfully. | ||
| temporal.api.failure.v1.Failure failure = 3; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth comparing the comments in PollWorkflowExecutionUpdateResponse. When we introduced that there was some discussion, leading to a decision to use a non-error response to signify that a long-polling client should retry.
| // Updated on terminal status. | ||
| int64 state_transition_count = 10; | ||
| // Updated once on scheduled and once on terminal status. | ||
| int64 state_size_bytes = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why inconsistently make these as fields instead of search attributes like workflow? Are they also in search attributes, or are we just forbidding the use of this information in queries for activities even though we allow it for workflows? By making them fields, we are ensuring that list and describe field behaviors diverge for the same field which is confusing to have to explain for the same field name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is available for workflows though...
api/temporal/api/workflow/v1/message.proto
Lines 43 to 45 in 1ae5b67
| string task_queue = 13; | |
| int64 state_transition_count = 14; | |
| int64 history_size_bytes = 15; |
Search attributes returned here are custom ones provided by the user in the start request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I was confused by the name list info here and not there (I probably won't be the last to be confused by this inconsistency). There are other mismatched fields though, but can bring those up separately.
9a8c86d to
6a3e9b1
Compare
|
@cretz the PR has changed substantially, would you like to review again? |
| temporal.api.deployment.v1.WorkerDeploymentOptions deployment_options = 6; | ||
| } | ||
|
|
||
| message PollActivityTaskQueueResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a standalone activity to have access to its own run ID? If so then PollActivityTaskQueueResponse should be extended to include it.
| // Activity run ID. If empty the request targets the latest run. | ||
| string run_id = 3; | ||
| // Exclude the info field from the response. | ||
| bool exclude_info = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this field exclude when others are include? If it's to make returning info the default for this operation, I don't think it's the right choice as I imagine the most common use case for polling execution would be waiting for activity completion and getting only the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine the most common use case for polling execution would be waiting for activity completion and getting only the result.
We make long poll opt-in on long-poll-optional calls. So the default here actually represents what a "describe activity" would do, but they aren't making a describe activity, so they are just making this behave like that by default. The "poll for result" which they've also put inside of this same call is the opt-in alternative call use. It makes sense from a caller/HTTP API POV to have the non-poll "describe" behavior be the default behavior despite the RPC name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this field exclude when others are include? If it's to make returning info the default for this operation, I don't think it's the right choice as I imagine the most common use case for polling execution would be waiting for activity completion and getting only the result.
Good point @maciejdudko. We made it exclude_info because in an earlier version of this PR the method was named DescribeActivityExecution. However, I subsequently renamed it PollActivityExecution in line with our aim for consistency and clarity in the gRPC API. The new name reflects the fact that this is a low-level API, which SDKs will build functionality such as DescribeActivity and FetchResult(longPoll: bool) on top of. As such, I think you're right, that the fields should now all have include semantics and that it is appropriate for the low-level "poll" gRPC API to default to not including any of the bulky data. Clients (such as SDK authors) making use of the API will not have difficulty understanding that they have to set one of the include fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clients (such as SDK authors) making use of the API will not have difficulty understanding that they have to set one of the include fields.
Disagree. I think curl https://my-temporal-host/namespaces/my-ns/activities/my-activity should return info like curl https://my-temporal-host/namespaces/my-ns/workflows/my-workflow, curl https://my-temporal-host/namespaces/my-ns/schedules/my-schedule, etc. I think we're regressing and making our APIs harder to use than we did in the past. I don't think we have to put the onus on users and clients to simplify complex APIs, we can develop non-complex APIs from the get-go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cretz makes a good point with curl. I'm on the side of leaving it as exclude_info now.
Missing API support for: - [ ] PauseActivity - [ ] UnpauseActivity - [ ] ResetActivity - [ ] UpdateActivityOptions
…io#651) Introduce a unified method for polling / long-polling an activity execution for status and/or outcome.
…io#650) Use execution-specific ID reuse & conflict policies instead of introducing shared policies for all CHASM executions.
5efe8a9 to
22d2e09
Compare
What changed?
Added first revision of APIs for standalone activities.
Missing API support for:
Notes for reviewers
ActivityExecution? What would it be called?EntityExecution?StartActivityExecutionRequest?ActivityOptionsinStartActivityExecutionRequest? Would more of the fields in the request be considered options in the future?Server PR
Not done yet, this will be the next step.