Skip to content

Conversation

DilumAluthge
Copy link
Member

No description provided.

Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.83%. Comparing base (11d491c) to head (5c361a8).

Files with missing lines Patch % Lines
ext/SlurmClusterManagerDistributedNextExt.jl 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #80      +/-   ##
==========================================
- Coverage   93.61%   91.83%   -1.79%     
==========================================
  Files           2        3       +1     
  Lines          94       98       +4     
==========================================
+ Hits           88       90       +2     
- Misses          6        8       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DilumAluthge
Copy link
Member Author

@JamesWrigley @jpsamaroo Does this look correct?

@DilumAluthge
Copy link
Member Author

Hmmm. Is this going to be sufficient?

Because, if I look here:

_srun_cmd_without_env = `srun -D $exehome $exename $exeflags --worker`

I see that SlurmClusterManager is starting (under srun) each worker with julia --worker. But isn't julia --worker hardcoded to always use Distributed.jl?

@JamesWrigley
Copy link

Yeah that won't work, for DistributedNext you'd want to call get_worker_arg() and interpolate that into _srun_cmd_without_env: https://github.com/JuliaParallel/DistributedNext.jl/blob/7023c603014d0e78defedd2a3d753061faeabf32/src/managers.jl#L245

I might also suggest putting Distributed support in a package extension too, that way people only need to load one or the other packages.

@DilumAluthge
Copy link
Member Author

I might also suggest putting Distributed support in a package extension too, that way people only need to load one or the other packages.

Hmmm. Unfortunately, the SlurmClusterManager.SlurmManager struct subtypes Distributed.ClusterManager. It might be breaking to change that.

@DilumAluthge
Copy link
Member Author

Yeah that won't work, for DistributedNext you'd want to call get_worker_arg() and interpolate that into _srun_cmd_without_env: https://github.com/JuliaParallel/DistributedNext.jl/blob/7023c603014d0e78defedd2a3d753061faeabf32/src/managers.jl#L245

If we're going to need to make a change anyway, would it be worth exposing a public interface to the user, that provides a general interface for the user to specify the worker arg? E.g. something like #74?

My motivation for #74 was that I wanted to be able to start workers with julia -e 'import DilumsPackage; DilumsPackage.entrypoint()' (instead of julia --worker). It seems that DistributedNext would be a special case of this more general feature?

@JamesWrigley
Copy link

Hmmm. Unfortunately, the SlurmClusterManager.SlurmManager struct subtypes Distributed.ClusterManager. It might be breaking to change that.

Oh I forgot that's a requirement of implementing the cluster interface. But then won't it be necessary to subtype DistributedNext.ClusterManager? 🤔 Need to think about a clean way to do that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants