Skip to content

Conversation

@gaogaotiantian
Copy link
Contributor

@gaogaotiantian gaogaotiantian commented Dec 2, 2025

What changes were proposed in this pull request?

We hijack the stdin/stdout/stderr in run-tests.py so we can support breakpoint() in test cases. The users can now write a breakpoint() in a test and debug the program when they use run-tests.py.

Why are the changes needed?

It's difficult to debug a failed test. This also lays the foundation to temporarily redirect stderr/stdout to sys.stdout.

Does this PR introduce any user-facing change?

No. The original behavior should be preserved. The performance impact is negligible.

How was this patch tested?

Add a line of breakpoint() in the test case then run the test with ./run-tests --testname="...". The code will hit the breakpoint and bring up pdb. The user can debug the test using pdb interface.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the PYTHON label Dec 2, 2025
Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more details in PR descriptions regarding how to use breakpoint() or could you add some documentation in testing pyspark?

@gaogaotiantian
Copy link
Contributor Author

Could you add more details in PR descriptions regarding how to use breakpoint()

breakpoint() is a Python builtin. You just write breakpoint() at where you want to set a breakpoint and it should bring up a debugger. That's not directly related to pyspark. If you are running the driver directly, it should just work (on driver side). However, run-tests.py brings a subprocess which requires some extra work to build the communication.

@HyukjinKwon
Copy link
Member

Yeah but let's document or explain how you tested this though. Otherwise, dev people won't much know even if this exists

@gaogaotiantian
Copy link
Contributor Author

I updated the PR description. I don't know which part of detail should be added. breakpoint() is a builtin function you can add in your source code to trigger a breakpoint (by default it will bring up pdb). It automatically works when you run a script. However, we run tests in a subproces, which requires some stdin/stdout handling between run-tests and the actual test - that's what this PR is about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants