-
Notifications
You must be signed in to change notification settings - Fork 6
Copy States Optimization #297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Copy States Optimization #297
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #297 +/- ##
==========================================
+ Coverage 94.64% 94.94% +0.29%
==========================================
Files 9 9
Lines 654 752 +98
==========================================
+ Hits 619 714 +95
- Misses 35 38 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
try fixing benchmarking upgrade manifest add Test(try fixing benchmarking) let ci refresh manifest Revert "let ci refresh manifest" This reverts commit 5385785. try fixing benchmarking
This reverts commit 2b10cee.
8ac5abb
to
273cde9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Chenge, I hope you are doing well. Sorry it's taken so long to review your code after marking the report. I think this is very good work and should definitely get merged into the upstream repo.
I'd like to have a few clarifying comments in some places that I've highlighted in my review comments. I think overall the optimisations you made should be the default behaviour, rather than something the user has to switch on. This would especially simplify the copy_states!
and copy_states_dedup!
functions that duplicate some code at the moment.
If Matt and Mose could also take a look at this, that would be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this notebook reproduce the plots of your report? It's really nice to have it if it does! It could use some plain text description of what it's doing.
/home/ucabc46/.julia/bin/mpiexecjl -n $SLURM_NNODES\ | ||
julia --project=. \ | ||
/home/ucabc46/exp/ParticleDA.jl/test/mpi_optimized_copy_states.jl -t /home/ucabc46/exp/ParticleDA.jl/test/output/dedup_threading_optimize_resampling/all_timers_$SLURM_NNODES.h5 -o No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having absolute paths to your home directory here will force anyone else using this to manually edit all of them. To reduce the amount of manual editing a different user would have to do, I'd put the paths into variables and use an env variable for home.
/home/ucabc46/.julia/bin/mpiexecjl -n $SLURM_NNODES\ | |
julia --project=. \ | |
/home/ucabc46/exp/ParticleDA.jl/test/mpi_optimized_copy_states.jl -t /home/ucabc46/exp/ParticleDA.jl/test/output/dedup_threading_optimize_resampling/all_timers_$SLURM_NNODES.h5 -o | |
PARTICLEDA_TEST_DIR=$HOME/exp/ParticleDA.jl/test | |
JULIA_DIR=$HOME/.julia | |
$JULIA_DIR/bin/mpiexecjl -n $SLURM_NNODES\ | |
julia --project=. \ | |
$PARTICLEDA_TEST_DIR/mpi_optimized_copy_states.jl -t $PARTICLEDA_TEST_DIR/output/dedup_threading_optimize_resampling/all_timers_$SLURM_NNODES.h5 -o |
/home/ucabc46/.julia/bin/mpiexecjl -n $SLURM_NNODES\ | ||
julia --project=. \ | ||
/home/ucabc46/exp/ParticleDA.jl/extra/weak_scaling/run_particleda.jl No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment
station_filename: "stationsW1.txt" | ||
obs_noise_std: [0.01] | ||
|
||
station_filename: "/home/ucabc46/exp/ParticleDA.jl/extra/weak_scaling/stationsW1.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this should not point to your home dir. Can we use a relative path here?
# Verify BLAS implementation is OpenBLAS | ||
@assert occursin("openblas", string(BLAS.get_config())) | ||
|
||
# Set size of thread pool for BLAS operations to 1 | ||
BLAS.set_num_threads(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels sensible. We don't want BLAS oversubscribing threads. Could you put a comment here to explain why we require OpenBLAS?
particle_save_time_indices::V = [] | ||
seed::Union{Nothing, Int} = nothing | ||
n_tasks::Int = -1 | ||
optimize_copy_states::Bool = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you demonstrated this works well, I would like to have it true
by default.
@@ -1,5 +1,6 @@ | |||
[deps] | |||
ChunkSplitters = "ae650224-84b6-46f8-82ea-d812ca08434e" | |||
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this being used anywhere? Do we need the test for stale dependencies @giordano
dedup::Bool = false | ||
) where T | ||
|
||
if dedup | ||
return copy_states_dedup!(particles, buffer, resampling_indices, my_rank, nprt_per_rank, to) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bit of code duplication here that is not ideal, and I think is confusing GitHub in showing the diff. I would be happy with replacing the old copy_states!
with the new deduplicating version entirely. I think you showed the overhead of removing the duplicates is small in all realistic cases. That would make the code easier to read and maintain in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why the timer object is in the arguments of this function. If I still remember how this works, in the main run_particle_filter
function the timer will be updated by the @timeit_debug
macro in the calling function and returning it as an argument is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer object is in the arguments because we have a separate testing script for the copy_states!
function.
end | ||
|
||
particles .= buffer | ||
function _categorize_wants(particles_want, my_rank::Int, nprt_per_rank::Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
particles_want
is missing its type here
if source_rank == my_rank | ||
get!(() -> Int[], local_copies, id) |> v -> push!(v, k) | ||
else | ||
get!(() -> Int[], remote_copies, id) |> v -> push!(v, k) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit is quite difficult to understand. Can you add some comments to explain what it does? If I understand correctly, you are pushing id
into an element of either local_copies
or remote_copies
based on the outcome of the if statement.
Description
This PR introduces a two-phase optimisation to address communication bottlenecks in the copy_states routine during distributed resampling:
Issue
#116
Testing