Skip to content

Conversation

@zirtoshka
Copy link

Scheduling-related performance issues such as data skew and load imbalance remain difficult to diagnose automatically in open-source systems like Spark.

Current limitations:

  • detection of skew requires manual inspection of the Spark UI and logs;
  • Spark does not emit a clear real-time signal that:
    “this stage is suffering from scheduling-related performance problems”;
  • automated remediation tools cannot act without such a signal.

This project implements a small but practical step towards automated remediation:

A lightweight driver-side skew detector that emits a structured event whenever scheduling-related issues occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant