Skip to content

[Improvement]: The optimizer adds a cache of eq delete files to reduce repeated IO cost of eq delete files #2553

@zhongqishang

Description

@zhongqishang

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

For large tables written by Flink, each commit will submit an EQ DELETE file associated with all previous data files. Most of the generated optimize tasks will repeatedly read this EQ DELETE file, causing duplicate IO cost.

How should we improve?

Each JVM(taskmanager, executor) in the Optimizer generates a Cache to cache the EQ DELETE File.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions