-
Notifications
You must be signed in to change notification settings - Fork 117
Strategy for running MFC out-of-core on NVIDIA Grace-Hopper using Unified Memory #972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
2358d29
Add scripts for santis/alps, example case, and captures for UVM comms…
ntselepidis 37d393b
Add PREFER_GPU and rearrange update for out-of-core computation
ntselepidis 693c7f4
Allow keeping q_cons_ts(2) on CPU using pinned allocations
ntselepidis 7054b7b
Modify PREFER_GPU macro
ntselepidis ee1277d
Allow control in placement of IGR temps
ntselepidis 4065c02
Do some clean up
ntselepidis cfb792c
ENV Vars to case file options and code structure changes
cacc6b0
Fix some comments
ntselepidis 884a4d9
Merge remote-tracking branch 'upstream/master' into nvidia
wilfonba b3fdbff
test merge and add nv_uvm_out_of_core back
wilfonba 51d7e90
Fix some allocs and deallocs in timesteppers
ntselepidis c553b78
Fix nv_uvm_out_of_core inconsistency and add to case file
ntselepidis f3b3851
Fix bug in 2nd order TVD RK introduced by merge
ntselepidis 71b5976
Fix some comments
ntselepidis a4d6b38
Add note on binding script requirement for PREFER_GPU macro
ntselepidis acb2405
Flip nv_uvm_pref_gpu default to false
ntselepidis 8fef22d
Be explicit with unified memory compilation to stay robust in changes…
ntselepidis 5e369c3
Add some changes to future proof the unified memory build
ntselepidis 52c5608
Comment out calls to cudaGetErrorString
ntselepidis 4ec8617
prepare for merge
wilfonba bd0adee
Merge remote-tracking branch 'upstream/master' into nvidia
wilfonba 37b1768
update capture
wilfonba e02e9f6
add fastmath flag and bug fix
wilfonba a6ff639
Fix typo in CMakeLists
ntselepidis 457ae60
Replace host_pool with host in m_igr
ntselepidis a6116f2
Set cpus-per-task to 72 and update binding script
ntselepidis fb50e90
Add some more updates to the helper scripts
ntselepidis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
#!/usr/bin/env python3 | ||
import math | ||
import json | ||
|
||
N = 799 | ||
Nx = N | ||
Ny = 2*(N+1)-1 | ||
Nz = 2*(N+1)-1 | ||
|
||
Re = 1600 | ||
L = 1 | ||
P0 = 101325 | ||
rho0 = 1 | ||
C0 = math.sqrt(1.4 * P0) | ||
V0 = 0.1 * C0 | ||
mu = V0 * L / Re | ||
|
||
cfl = 0.5 | ||
dx = 2 * math.pi * L / (Ny + 1) | ||
|
||
dt = cfl * dx / (C0) | ||
|
||
tC = L / V0 | ||
tEnd = 20 * tC | ||
|
||
Nt = int(tEnd / dt) | ||
Nt = 10 | ||
|
||
|
||
# Configuring case dictionary | ||
print( | ||
json.dumps( | ||
{ | ||
"rdma_mpi": "T", | ||
# Logistics | ||
"run_time_info": "F", | ||
# Computational Domain Parameters | ||
"x_domain%beg": -math.pi * L, | ||
"x_domain%end": math.pi * L, | ||
"y_domain%beg": -math.pi * L, | ||
"y_domain%end": math.pi * L, | ||
"z_domain%beg": -math.pi * L, | ||
"z_domain%end": math.pi * L, | ||
"m": Nx, | ||
"n": Ny, | ||
"p": Nz, | ||
"cyl_coord": "F", | ||
"dt": dt, | ||
"t_step_start": 0, | ||
"t_step_stop": Nt, | ||
"t_step_save": int(Nt / 100), | ||
# Simulation Algorithm Parameters | ||
"num_patches": 1, | ||
"model_eqns": 2, | ||
"num_fluids": 1, | ||
"time_stepper": 3, | ||
"bc_x%beg": -1, | ||
"bc_x%end": -1, | ||
"bc_y%beg": -1, | ||
"bc_y%end": -1, | ||
"bc_z%beg": -1, | ||
"bc_z%end": -1, | ||
"igr": "T", | ||
"igr_order": 5, | ||
"igr_iter_solver": 1, | ||
"num_igr_iters": 3, | ||
"num_igr_warm_start_iters": 3, | ||
"alf_factor": 10, | ||
"viscous": "T", | ||
# Formatted Database Files Structure Parameters | ||
"format": 1, | ||
"precision": 2, | ||
"prim_vars_wrt": "T", | ||
"omega_wrt(1)": "T", | ||
"omega_wrt(2)": "T", | ||
"omega_wrt(3)": "T", | ||
"qm_wrt": "T", | ||
"fd_order": 4, | ||
"parallel_io": "T", | ||
# Patch 1: Background (AIR - 2) | ||
"patch_icpp(1)%geometry": 9, | ||
"patch_icpp(1)%x_centroid": 0, | ||
"patch_icpp(1)%y_centroid": 0, | ||
"patch_icpp(1)%z_centroid": 0, | ||
"patch_icpp(1)%length_x": 2 * math.pi * L, | ||
"patch_icpp(1)%length_y": 2 * math.pi * L, | ||
"patch_icpp(1)%length_z": 2 * math.pi * L, | ||
"patch_icpp(1)%vel(1)": 0.0, | ||
"patch_icpp(1)%vel(2)": 0.0, | ||
"patch_icpp(1)%vel(3)": 0, | ||
"patch_icpp(1)%pres": 0.0, | ||
"patch_icpp(1)%hcid": 380, | ||
"patch_icpp(1)%alpha_rho(1)": 1, | ||
"patch_icpp(1)%alpha(1)": 1, | ||
# Fluids Physical Parameters | ||
"fluid_pp(1)%gamma": 1.0e00 / (1.4 - 1), | ||
"fluid_pp(1)%pi_inf": 0, | ||
"fluid_pp(1)%Re(1)": 1 / mu, | ||
} | ||
) | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/usr/bin/env bash | ||
|
||
# -------------------------------- # | ||
# Binding for a single Santis node # | ||
# -------------------------------- # | ||
|
||
# Local rank | ||
export local_rank="${OMPI_COMM_WORLD_LOCAL_RANK:-$SLURM_LOCALID}" | ||
|
||
# Bind to GPU | ||
export CUDA_VISIBLE_DEVICES="$local_rank" | ||
|
||
# Binding to NIC | ||
export MPICH_OFI_NIC_POLICY=USER | ||
export MPICH_OFI_NIC_MAPPING="0:0; 1:1; 2:2; 3:3" | ||
|
||
# Bind to cores ( first core per socket ) | ||
physcores=(0 72 144 216) | ||
|
||
#echo hostname: $(hostname), rank: $local_rank, cores: ${physcores[$local_rank]}, GPU: $CUDA_VISIBLE_DEVICES, NIC mapping: $MPICH_OFI_NIC_POLICY | ||
|
||
#set -x | ||
numactl -l --all --physcpubind=${physcores[$local_rank]} "$@" | ||
#set +x |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/bin/bash | ||
|
||
#set -x | ||
set -euo pipefail | ||
|
||
rank="${OMPI_COMM_WORLD_RANK:-$SLURM_PROCID}" | ||
|
||
[[ -z "${NSYS_FILE+x}" ]] && NSYS_FILE=report.qdrep | ||
[[ -z "${NSYS+x}" ]] && NSYS=0 | ||
|
||
if [[ "$NSYS" -ne 0 && "$rank" -eq 0 ]]; then | ||
exec nsys profile \ | ||
--cpuctxsw=none -b none -s none \ | ||
--event-sample=system-wide \ | ||
--cpu-socket-events=61,71,265,273 \ | ||
--cpu-socket-metrics=103,104 \ | ||
--event-sampling-interval=10 \ | ||
--trace=nvtx,openacc \ | ||
--force-overwrite=true \ | ||
-e NSYS_MPI_STORE_TEAMS_PER_RANK=1 \ | ||
-o "$NSYS_FILE" "$@" | ||
else | ||
exec "$@" | ||
fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.