Skip to content

Conversation

shangm2
Copy link
Contributor

@shangm2 shangm2 commented Apr 30, 2025

Description

  1. We are enabling thrift for task update request, and task info for critical api communication between coordinator and worker. We have two config toggles for task update request sent to worker and the task info returned to coordinator,
  2. However, there are some classes that are java interface/polymorphic fields. We keep them as json encoding for now and will migrate them in the next step.
  3. We are also doing proper change for native worker: [native] Migrate TaskStatus, TaskUpdateRequest, TaskInfo to Thrift with JSON fields in CPP #25079

Motivation and Context

  1. We observed that coordinator can spend too much cpu/heap memory on json serde for taskUpdateRequest.

Impact

Test Plan

  1. Passed verifier tests

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Improve communication between coordinator and worker with thrift serde.

@shangm2 shangm2 requested review from elharo and a team as code owners April 30, 2025 20:49
@shangm2 shangm2 requested a review from jaystarshot April 30, 2025 20:49
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Apr 30, 2025
@shangm2 shangm2 changed the title Maks TaskStatus, TaskUpdateRequest, TaskInfo Thrift ready with json fields Maks TaskUpdateRequest, TaskInfo Thrift ready with json fields May 1, 2025
@shangm2 shangm2 changed the title Maks TaskUpdateRequest, TaskInfo Thrift ready with json fields Make taskUpdateRequest and taskInfo classes Thrift ready with json fields May 1, 2025
@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch 15 times, most recently from 6c457e3 to 1f1b450 Compare May 5, 2025 16:23
SCALAR(1),
AGGREGATE(2),
WINDOW(3);
SCALAR(0),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like cpp worker expects enum to start from 0. Vivian should have more context and will ask her to comment.

private final List<TypeSignature> argumentTypes;

@ThriftConstructor
public SqlFunctionId(String signature)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like cpp worker is expecting a string.

@facebook-github-bot
Copy link
Collaborator

@shangm2 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch from 1f1b450 to d701020 Compare May 5, 2025 18:24
@facebook-github-bot
Copy link
Collaborator

@shangm2 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Collaborator

@shangm2 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch from d701020 to 1632a75 Compare May 5, 2025 21:06
@facebook-github-bot
Copy link
Collaborator

@shangm2 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch 3 times, most recently from a811b10 to f9cb110 Compare May 8, 2025 01:31
referencedType)),
Optional.empty());
return new ThriftStructMetadata(
originalType.getSimpleName() + "Wrapper",
Copy link
Contributor Author

@shangm2 shangm2 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper struct will look like this

Screenshot 2025-05-08 at 14 38 19

arhimondr
arhimondr previously approved these changes May 8, 2025
@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch from f9cb110 to 2b918bf Compare May 8, 2025 18:21

ThriftFieldMetadata fieldMetaData = new ThriftFieldMetadata(
fieldId,
false, false, NONE, ImmutableMap.of(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fbthrift gives the following waring if "REQUIRED" is used here for requireness in idl. Change it to NONE.

Screenshot 2025-05-08 at 11 23 01

@facebook-github-bot
Copy link
Collaborator

@vhsu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tdcmeehan
Copy link
Contributor

However, there are some classes that are java interface/polymorphic fields. We keep them as json encoding for now and will migrate them in the next step.

How do we plan to handle these polymorphic types, given that many of them are connector-specific and not known upfront? Is there a design you can share?

@shangm2
Copy link
Contributor Author

shangm2 commented May 14, 2025

However, there are some classes that are java interface/polymorphic fields. We keep them as json encoding for now and will migrate them in the next step.

How do we plan to handle these polymorphic types, given that many of them are connector-specific and not known upfront? Is there a design you can share?

Hey Tim. For connector-specific, we will use a similar design as HandleResolver.java. We will start with prism connector.
The thrift serialization will certainly be optional, and we also want to make it flexible for other options, say, if we want to move to other serde. As you may already know, for interfaces of which we already know all its implementation upfront, we will be using thrift union.

I am working with @amitkdutta to make a document that can be shared given that the company is monitoring all the typing/uploading/copy-paste thing now (maybe even before but it is getting worse).

Let me know if this helps and thanks for all the comments!

@tdcmeehan
Copy link
Contributor

Thanks @shangm2. I'm not sure if this helps you, but I don't think you should copy over an internal document, this should be written as an RFC so the open source community can work with you on the design.

@shangm2
Copy link
Contributor Author

shangm2 commented May 16, 2025

However, there are some classes that are java interface/polymorphic fields. We keep them as json encoding for now and will migrate them in the next step.

How do we plan to handle these polymorphic types, given that many of them are connector-specific and not known upfront? Is there a design you can share?

Hey @tdcmeehan . I have a rfc ready for review. prestodb/rfcs#38

@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch 2 times, most recently from 2916c02 to 5e93c8b Compare May 16, 2025 19:02
@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch from 5e93c8b to d396cab Compare May 16, 2025 19:54
@shangm2 shangm2 force-pushed the driftIDL_Generator_With_Mixed_ThriftAndJson branch from d396cab to 8a9265a Compare May 16, 2025 20:08
@shangm2 shangm2 merged commit 9f77bed into prestodb:master May 19, 2025
98 checks passed
shangm2 pushed a commit that referenced this pull request May 19, 2025
…th JSON fields in CPP (#25079)

## Description
This PR depends on #25020.
Changes include:
- Use IDL generated by coordinator and make relevant changes to enable
thrift for TaskStatus, TaskUpdateRequest, and TaskInfo on native worker.
- Remove old pipeline of using python, json, and chevron templates for
producing C++ code.

## Motivation and Context
We observed that coordinator can spend too much cpu/heap memory on json
serde for taskUpdateRequest.

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
Verifier

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Improve communication between coordinator and worker with thrift serde.
```
@ZacBlanco ZacBlanco mentioned this pull request May 29, 2025
21 tasks
AnuragKDwivedi pushed a commit to AnuragKDwivedi/presto-1 that referenced this pull request Jul 8, 2025
…elds (prestodb#25020)

## Description
1. We are enabling thrift for task update request, and task info for
critical api communication between coordinator and worker. We have two
config toggles for task update request sent to worker and the task info
returned to coordinator,
2. However, there are some classes that are java interface/polymorphic
fields. We keep them as json encoding for now and will migrate them in
the next step.
3. We are also doing proper change for native worker: prestodb#25079 

## Motivation and Context
1. We observed that coordinator can spend too much cpu/heap memory on
json serde for taskUpdateRequest.

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
1. Passed verifier tests

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Improve communication between coordinator and worker with thrift serde.
```
AnuragKDwivedi pushed a commit to AnuragKDwivedi/presto-1 that referenced this pull request Jul 8, 2025
…th JSON fields in CPP (prestodb#25079)

## Description
This PR depends on prestodb#25020.
Changes include:
- Use IDL generated by coordinator and make relevant changes to enable
thrift for TaskStatus, TaskUpdateRequest, and TaskInfo on native worker.
- Remove old pipeline of using python, json, and chevron templates for
producing C++ code.

## Motivation and Context
We observed that coordinator can spend too much cpu/heap memory on json
serde for taskUpdateRequest.

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
Verifier

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Improve communication between coordinator and worker with thrift serde.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:Meta PR from Meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants