Skip to content

Commit 9b800e9

Browse files
authored
Merge pull request #749 from timkphd/gh-pages
gpubuildandrun.md - works with current modules, f90_advanced.md - rm …
2 parents 98d35c9 + 854e76b commit 9b800e9

File tree

3 files changed

+108
-63
lines changed

3 files changed

+108
-63
lines changed

docs/Documentation/Development/Languages/Fortran/f90_advanced.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2822,16 +2822,13 @@ end
28222822

28232823
## References
28242824
- [http://www.fortran.com/fortran/](http://www.fortran.com/fortran/) Pointer to everything Fortran
2825-
- [http://meteora.ucsd.edu/~pierce/fxdr_home_page.html](http://meteora.ucsd.edu/~pierce/fxdr_home_page.html) Subroutines to do unformatted I/O across platforms, provided by David Pierce at UCSD
28262825
- [http://www.nsc.liu.se/~boein/f77to90/a5.html](http://www.nsc.liu.se/~boein/f77to90/a5.html) A good reference for intrinsic functions
28272826
- [https://wg5-fortran.org/N1551-N1600/N1579.pdf](https://wg5-fortran.org/N1551-N1600/N1579.pdf)New Features of Fortran 2003
28282827
- [https://wg5-fortran.org/N1701-N1750/N1729.pdf](https://wg5-fortran.org/N1701-N1750/N1729.pdf)New Features of Fortran 2008
28292828
- [http://www.nsc.liu.se/~boein/f77to90/](http://www.nsc.liu.se/~boein/f77to90/) Fortran 90 for the Fortran 77 Programmer
28302829
- <b>Fortran 90 Handbook Complete ANSI/ISO Reference</b>. Jeanne Adams, Walt Brainerd, Jeanne Martin, Brian Smith, Jerrold Wagener
28312830
- <b>Fortran 90 Programming</b>. T. Ellis, Ivor Philips, Thomas Lahey
28322831
- [https://github.com/llvm/llvm-project/blob/master/flang/docs/FortranForCProgrammers.md](https://github.com/llvm/llvm-project/blob/master/flang/docs/FortranForCProgrammers.md)
2833-
- [FFT stuff](../mkl/)
2834-
- [Fortran 95 and beyond](../95/)
28352832

28362833
<!--
28372834
- [French Translation provided by Mary Orban](http://www.pkwteile.ch/science/avancee-fortran-90/)

docs/Documentation/Systems/Kestrel/Environments/Toolchains/intel.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ These are the module you will need for compiles:
1717
```
1818
module load intel-oneapi-compilers
1919
module load intel-oneapi-mpi
20-
module load gcc/13.1.0
20+
module load gcc
2121
```
2222

2323
Intel compilers use some gcc functionality so we load gcc to give a newer version of that compiler.
@@ -60,10 +60,11 @@ We can compile with the extra flag.
6060

6161
```
6262
mpiicc -diag-disable=10441 -O3 -g -fopenmp ex1.c -o gex_c
63-
```
64-
65-
#### 2b. Older compiler (icc) might not be avialable
66-
63+
```
64+
65+
#### 2b. Older compiler (icc) might not be available
66+
67+
6768
Depending on the version of compilers loaded the message shown above might be replaced with one saying that the icx is no longer available. In this case you **MUST** use icx. There are two ways to do that shown below.
6869

6970
#### 3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)
@@ -72,11 +73,11 @@ Depending on the version of compilers loaded the message shown above might be re
7273
export I_MPI_CC=icx
7374
mpiicc -O3 -g -fopenmp ex1.c -o ex_c
7475
```
75-
Setting the environmental variable tells mpiicc to use icx (the newer Intel compiler) instead of icc.
76-
77-
76+
Setting the environmental variable tells mpiicc to use icx (the newer Intel compiler) instead of icc.
77+
78+
7879
#### 3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)
79-
80+
8081
```
8182
mpiicx -O3 -g -fopenmp ex1.c -o ex_c
8283
```

docs/Documentation/Systems/Kestrel/Environments/gpubuildandrun.md

Lines changed: 98 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Building and Running on Kestrel's H100 GPU nodes.
22
This page describes how to build and run on Kestrel's GPU nodes using several programming paradigms. There are pure Cuda programs, Cuda aware MPI programs, MPI programs without Cuda, MPI programs with Cuda, MPI programs with Openacc, and pure Openacc programs.
33

4+
5+
46
The examples are contained in a tarball available on Kestrel via the command:
57

68
```bash
@@ -21,6 +23,42 @@ sbatch --account=MYACCOUNT script
2123
```
2224
where you need to provide your account name. This will run in about 22 minutes using 2 GPU nodes. Some of the examples require 2 nodes but most will run on a single node.
2325

26+
## Note about gcc/gfortran
27+
28+
Almost all compiling/running on a linux system will at some point reference or in some way use some portion of the GNU (gcc/gfortran/linker) system. Kestrel has many versions of gcc. These fall into three categories:
29+
30+
* Native to the Operating system
31+
* Built by Cray
32+
* Built by NREL
33+
34+
You will also see modules for "mixed" versions. These are just duplicates of others and should not be loaded.
35+
36+
Here are some of the options:
37+
38+
#### module load gcc-native/12.1
39+
* which gcc
40+
* /opt/rh/gcc-toolset-12/root/usr/bin/gcc
41+
* Native to the operating system
42+
43+
#### module load gcc/12.2.0
44+
* which gcc
45+
* /opt/cray/pe/gcc/12.2.0/bin/gcc
46+
* Built by the vendor
47+
48+
#### module load gcc-standalone/13.1.0
49+
* which gcc
50+
* /nopt/nrel/apps/gpu_stack/compilers/03-24/.../gcc-13.1.0.../bin/gcc
51+
* Built by NREL
52+
53+
#### module load gcc-standalone/12.3.0
54+
* which gcc
55+
* /nopt/nrel/apps/cpu_stack/compilers/06-24/.../gcc-12.3.0.../bin/gcc
56+
* Built by NREL
57+
58+
59+
60+
61+
2462
## Helper files
2563

2664
There are a number of "helper" files shipped with the examples. The script *onnodes* is run while you have a job running. You specify the jobid and it will report what is running on each node owned by the job. This will include the core on which each task/thread is running. On GPU nodes it will also report what you have running on each GPU.
@@ -170,6 +208,11 @@ export doits="./cudalib/factor/doit ./cudalib/fft/doit ./mpi/cudaaware/doit"
170208

171209

172210

211+
212+
213+
214+
215+
173216
## The script
174217

175218
Our script, shown below does the following:
@@ -195,7 +238,12 @@ Our script, shown below does the following:
195238
#SBATCH --output=output-%j.out
196239
#SBATCH --error=infor-%j.out
197240

241+
func () { typeset -f $1 || alias $1; }
242+
func module > module.$SLURM_JOB_ID
198243

244+
# we "overload" srun here to not allow any one subjob has
245+
# a problem and to runs too long. this should not happen.
246+
alias srun="/nopt/slurm/current/bin/srun --time=00:05:00 $@"
199247
if echo $SLURM_SUBMIT_HOST | egrep "kl5|kl6" >> /dev/null ; then : ; else echo Run script from a GPU node; exit ; fi
200248
# a simple timer
201249
dt ()
@@ -213,7 +261,7 @@ Our script, shown below does the following:
213261
cat $0 > script-$SLURM_JOB_ID.out
214262

215263
#runs script to put our restore function in our environment
216-
. /nopt/nrel/apps/env.sh
264+
. whack.sh
217265
module_restore
218266

219267
#some possible values for gcc module
@@ -239,13 +287,12 @@ Our script, shown below does the following:
239287
t1=`dt`
240288
for x in $doits ; do
241289
dir=`dirname $x`
242-
echo ++++++++ $dir >&2
243-
echo ++++++++
290+
echo ++++++++ $dir | tee >(cat 1>&2)
244291
echo $dir
245292
cd $dir
246293
tbegin=`dt`
247294
. doit | tee $SLURM_JOB_ID
248-
echo Runtime `dt $tbegin` $dir `dt $t1` total
295+
echo Runtime `dt $tbegin` $dir `dt $t1` total | tee >(cat 1>&2)
249296
cd $startdir
250297
done
251298
echo FINISHED `dt $t1`
@@ -254,53 +301,53 @@ Our script, shown below does the following:
254301
mkdir -p /scratch/$USER/gputest/$SLURM_JOB_ID
255302
cp *out /scratch/$USER/gputest/$SLURM_JOB_ID
256303
# . cleanup
257-
258-
```
259-
260-
261-
## cuda/cray
262-
Here we build and run a single GPU code stream.cu. This code is a standard benchmark that measures the floating point performance for a GPU.
263-
264-
In this case we are loading PrgEnv-nvhpc/8.4.0 which requires cray-libsci/23.05.1.4. We compile with the "wrapper" compiler CC which, in this case builds with NVIDIA's backend compiler. CC would "pull in" Cray's MPI it it was required.
265-
266-
We run on each GPU of each Node in our allocation.
267-
268-
??? example "cuda/cray"
269-
```bash
270-
: Start from a known module state, the default
271-
module_restore
272-
273-
: Load modules
274-
#module unload PrgEnv-cray/8.5.0
275-
#module unload nvhpc/24.1
276-
277-
278-
if [ -z ${MYGCC+x} ]; then module load gcc ; else module load $MYGCC ; fi
279-
ml PrgEnv-nvhpc/8.4.0
280-
ml cray-libsci/23.05.1.4
281-
ml binutils
282-
: << ++++
283-
Compile our program
284-
CC as well as cc, and ftn are wrapper compilers. Because
285-
we have PrgEnv-nvidia loaded they map to Nvidia's compilers
286-
but use would use Cray MPI if this was an MPI program.
287-
Note we can also use nvcc since this is not an MPI program.
288-
++++
289-
290-
rm -rf ./stream.sm_90
291-
CC -gpu=cc90 -cuda -target-accel=nvidia90 stream.cu -o stream.sm_90
292-
# nvcc -std=c++11 -ccbin=g++ stream.cu -arch=sm_90 -o stream.sm_90
293-
294-
: Run on all of our nodes
295-
nlist=`scontrol show hostnames | sort -u`
296-
for l in $nlist ; do
297-
echo $l
298-
for GPU in 0 1 2 3 ; do
299-
: stream.cu will read the GPU on which to run from the command line
300-
srun -n 1 --nodes=1 -w $l ./stream.sm_90 -g $GPU
301-
done
302-
echo
303-
done
304+
305+
```
306+
307+
308+
## cuda/cray
309+
Here we build and run a single GPU code stream.cu. This code is a standard benchmark that measures the floating point performance for a GPU.
310+
311+
In this case we are loading PrgEnv-nvhpc/8.4.0 which requires cray-libsci/23.05.1.4. We compile with the "wrapper" compiler CC which, in this case builds with NVIDIA's backend compiler. CC would "pull in" Cray's MPI it it was required.
312+
313+
We run on each GPU of each Node in our allocation.
314+
315+
??? example "cuda/cray"
316+
```bash
317+
: Start from a known module state, the default
318+
module_restore
319+
320+
: Load modules
321+
#module unload PrgEnv-cray/8.5.0
322+
#module unload nvhpc/24.1
323+
324+
325+
if [ -z ${MYGCC+x} ]; then module load gcc ; else module load $MYGCC ; fi
326+
ml PrgEnv-nvhpc/8.4.0
327+
ml cray-libsci/23.05.1.4
328+
ml binutils
329+
: << ++++
330+
Compile our program
331+
CC as well as cc, and ftn are wrapper compilers. Because
332+
we have PrgEnv-nvidia loaded they map to Nvidia's compilers
333+
but use would use Cray MPI if this was an MPI program.
334+
Note we can also use nvcc since this is not an MPI program.
335+
++++
336+
337+
rm -rf ./stream.sm_90
338+
CC -gpu=cc90 -cuda -target-accel=nvidia90 stream.cu -o stream.sm_90
339+
# nvcc -std=c++11 -ccbin=g++ stream.cu -arch=sm_90 -o stream.sm_90
340+
341+
: Run on all of our nodes
342+
nlist=`scontrol show hostnames | sort -u`
343+
for l in $nlist ; do
344+
echo $l
345+
for GPU in 0 1 2 3 ; do
346+
: stream.cu will read the GPU on which to run from the command line
347+
srun -n 1 --nodes=1 -w $l ./stream.sm_90 -g $GPU
348+
done
349+
echo
350+
done
304351
```
305352

306353
## cuda/gccalso

0 commit comments

Comments
 (0)