-
Notifications
You must be signed in to change notification settings - Fork 77
Support --py-files to distribute files to executors #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
41b08b0 to
b6ac995
Compare
|
@carsonwang @pang-wu Could you please help review this PR? Thank you very much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myandpr Thanks for the contribution, can you add a unit test?
acd2d01 to
72cd4e6
Compare
| module_path.write_text("VALUE = 'pyfiles works'\n") | ||
|
|
||
| py_files_path = tmp_path / "extra_module.zip" | ||
| with zipfile.ZipFile(py_files_path, "w") as zip_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't have to zip it, spark should support submitting .py files
|
@myandpr Thank you for the PR. The cluster manager name "OTHERS" is what we added in this customized SparkSubmit.scala for Ray. We used a general name "OTHERS" instead of "RAY" because we tried to upstream the changes to Spark in early days. I feel there is no need to add both "OTHERS" and "RAY" in the file. A few options such as args.jars have been added for "OTHERS", but a few others are missing as you have seen the problem. I think you can just continue to use "OTHERS" but add the missed options. |
|
I created another PR: #441 |
Problem
We encountered an issue when submitting a job using the following command:
raydp-submit --ray-conf /root/ray.conf --py-files file.zip main.pyThe parameters for distributing files (such as --py-files) do not properly distribute the files to the executors. As a result, the executors cannot access or import the code from the files specified in --py-files.
Below is the error stack trace:
How to solve it
Previously, the Ray cluster master was grouped under
OTHERS, so the option assigner skipped writing--files/--archivesintospark.filesandspark.archives, which left executors without the distributed files.Explicitly recognizing
ray://asRAYand includingRAYin the related distribution logic is what makes these settings take effect now.