Skip to content

Conversation

jpsamaroo
Copy link
Member

@jpsamaroo jpsamaroo commented Sep 27, 2024

Similar to MPI's collective and P2P operations, this provides a convenient interface for writing code in the style of Single Program Multiple Data (SPMD). This can be quite convenient for implementing embarrassingly parallel algorithms, such as Distributed Data Parallel for ML.

Example usage:

fetch.(spmd(4; parallelize=:workers) do # Run one SPMD program per Distributed worker
  rank = spmd_rank() # from 1:4
  comm_size = spmd_size() # size of "comm" is 4
  all_ranks = spmd_exchange(rank) # pass in our own rank, get a vector of all ranks
  sum_of_ranks = only(spmd_reduce(+, [rank])) # reduces all ranks with +
end)

Todos:

  • Reconsider using RemoteChannels for communication
  • Add tags
  • Consider putting all spmd_* methods into their own module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant