Skip to content

Conversation

Angeladadd
Copy link

@Angeladadd Angeladadd commented Jul 17, 2025

Description

This PR introduces a two-phase optimisation to address communication bottlenecks in the copy_states routine during distributed resampling:

  1. Deduplication & Local Replication:
  • Transmit each unique particle only once across ranks.
  • Reconstruct duplicates locally with multi-threaded replication.
  1. Communication-Efficient Redistribution:
  • Reformulates resampling redistribution as a lightweight rank-level transportation problem.
  • Minimises unnecessary cross-rank transfers caused by global index ordering.
  • Solved efficiently with HiGHS, adding negligible overhead relative to communication savings.

Issue

#116

Testing

  • Added integration test 7 for covering the cases using optimised copy states and resampling function
  • Added mpi test for optimised copy states function
  • Added unit test for optimised resampling function
  • Added slurm scripts for
    • running end-to-end run_particle.jl
    • running mpi test for optimised copy states function

Copy link

codecov bot commented Sep 1, 2025

Codecov Report

❌ Patch coverage is 97.29730% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.94%. Comparing base (e314fbd) to head (e475640).

Files with missing lines Patch % Lines
src/utils.jl 97.24% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #297      +/-   ##
==========================================
+ Coverage   94.64%   94.94%   +0.29%     
==========================================
  Files           9        9              
  Lines         654      752      +98     
==========================================
+ Hits          619      714      +95     
- Misses         35       38       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ucabc46 and others added 6 commits September 1, 2025 15:41
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
try fixing benchmarking

upgrade manifest

add Test(try fixing benchmarking)

let ci refresh manifest

Revert "let ci refresh manifest"

This reverts commit 5385785.

try fixing benchmarking
@Angeladadd Angeladadd force-pushed the cgsun/copy_states_refine branch from 8ac5abb to 273cde9 Compare September 3, 2025 22:54
@Angeladadd Angeladadd marked this pull request as ready for review September 4, 2025 02:56
Copy link
Member

@tkoskela tkoskela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Chenge, I hope you are doing well. Sorry it's taken so long to review your code after marking the report. I think this is very good work and should definitely get merged into the upstream repo.

I'd like to have a few clarifying comments in some places that I've highlighted in my review comments. I think overall the optimisations you made should be the default behaviour, rather than something the user has to switch on. This would especially simplify the copy_states! and copy_states_dedup! functions that duplicate some code at the moment.

If Matt and Mose could also take a look at this, that would be great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this notebook reproduce the plots of your report? It's really nice to have it if it does! It could use some plain text description of what it's doing.

Comment on lines +17 to +19
/home/ucabc46/.julia/bin/mpiexecjl -n $SLURM_NNODES\
julia --project=. \
/home/ucabc46/exp/ParticleDA.jl/test/mpi_optimized_copy_states.jl -t /home/ucabc46/exp/ParticleDA.jl/test/output/dedup_threading_optimize_resampling/all_timers_$SLURM_NNODES.h5 -o No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having absolute paths to your home directory here will force anyone else using this to manually edit all of them. To reduce the amount of manual editing a different user would have to do, I'd put the paths into variables and use an env variable for home.

Suggested change
/home/ucabc46/.julia/bin/mpiexecjl -n $SLURM_NNODES\
julia --project=. \
/home/ucabc46/exp/ParticleDA.jl/test/mpi_optimized_copy_states.jl -t /home/ucabc46/exp/ParticleDA.jl/test/output/dedup_threading_optimize_resampling/all_timers_$SLURM_NNODES.h5 -o
PARTICLEDA_TEST_DIR=$HOME/exp/ParticleDA.jl/test
JULIA_DIR=$HOME/.julia
$JULIA_DIR/bin/mpiexecjl -n $SLURM_NNODES\
julia --project=. \
$PARTICLEDA_TEST_DIR/mpi_optimized_copy_states.jl -t $PARTICLEDA_TEST_DIR/output/dedup_threading_optimize_resampling/all_timers_$SLURM_NNODES.h5 -o

Comment on lines +15 to +17
/home/ucabc46/.julia/bin/mpiexecjl -n $SLURM_NNODES\
julia --project=. \
/home/ucabc46/exp/ParticleDA.jl/extra/weak_scaling/run_particleda.jl No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment

station_filename: "stationsW1.txt"
obs_noise_std: [0.01]

station_filename: "/home/ucabc46/exp/ParticleDA.jl/extra/weak_scaling/stationsW1.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this should not point to your home dir. Can we use a relative path here?

Comment on lines +7 to +11
# Verify BLAS implementation is OpenBLAS
@assert occursin("openblas", string(BLAS.get_config()))

# Set size of thread pool for BLAS operations to 1
BLAS.set_num_threads(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels sensible. We don't want BLAS oversubscribing threads. Could you put a comment here to explain why we require OpenBLAS?

particle_save_time_indices::V = []
seed::Union{Nothing, Int} = nothing
n_tasks::Int = -1
optimize_copy_states::Bool = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you demonstrated this works well, I would like to have it true by default.

@@ -1,5 +1,6 @@
[deps]
ChunkSplitters = "ae650224-84b6-46f8-82ea-d812ca08434e"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being used anywhere? Do we need the test for stale dependencies @giordano

Comment on lines +92 to +97
dedup::Bool = false
) where T

if dedup
return copy_states_dedup!(particles, buffer, resampling_indices, my_rank, nprt_per_rank, to)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of code duplication here that is not ideal, and I think is confusing GitHub in showing the diff. I would be happy with replacing the old copy_states! with the new deduplicating version entirely. I think you showed the overhead of removing the duplicates is small in all realistic cases. That would make the code easier to read and maintain in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why the timer object is in the arguments of this function. If I still remember how this works, in the main run_particle_filter function the timer will be updated by the @timeit_debug macro in the calling function and returning it as an argument is redundant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer object is in the arguments because we have a separate testing script for the copy_states! function.

end

particles .= buffer
function _categorize_wants(particles_want, my_rank::Int, nprt_per_rank::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

particles_want is missing its type here

Comment on lines +254 to +258
if source_rank == my_rank
get!(() -> Int[], local_copies, id) |> v -> push!(v, k)
else
get!(() -> Int[], remote_copies, id) |> v -> push!(v, k)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is quite difficult to understand. Can you add some comments to explain what it does? If I understand correctly, you are pushing id into an element of either local_copies or remote_copies based on the outcome of the if statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants