You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Documentation/Development/Languages/Fortran/f90_advanced.md
-3Lines changed: 0 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2822,16 +2822,13 @@ end
2822
2822
2823
2823
## References
2824
2824
-[http://www.fortran.com/fortran/](http://www.fortran.com/fortran/) Pointer to everything Fortran
2825
-
-[http://meteora.ucsd.edu/~pierce/fxdr_home_page.html](http://meteora.ucsd.edu/~pierce/fxdr_home_page.html) Subroutines to do unformatted I/O across platforms, provided by David Pierce at UCSD
2826
2825
-[http://www.nsc.liu.se/~boein/f77to90/a5.html](http://www.nsc.liu.se/~boein/f77to90/a5.html) A good reference for intrinsic functions
2827
2826
-[https://wg5-fortran.org/N1551-N1600/N1579.pdf](https://wg5-fortran.org/N1551-N1600/N1579.pdf)New Features of Fortran 2003
2828
2827
-[https://wg5-fortran.org/N1701-N1750/N1729.pdf](https://wg5-fortran.org/N1701-N1750/N1729.pdf)New Features of Fortran 2008
2829
2828
-[http://www.nsc.liu.se/~boein/f77to90/](http://www.nsc.liu.se/~boein/f77to90/) Fortran 90 for the Fortran 77 Programmer
2830
2829
- <b>Fortran 90 Handbook Complete ANSI/ISO Reference</b>. Jeanne Adams, Walt Brainerd, Jeanne Martin, Brian Smith, Jerrold Wagener
2831
2830
- <b>Fortran 90 Programming</b>. T. Ellis, Ivor Philips, Thomas Lahey
#### 2b. Older compiler (icc) might not be avialable
66
-
63
+
```
64
+
65
+
#### 2b. Older compiler (icc) might not be available
66
+
67
+
67
68
Depending on the version of compilers loaded the message shown above might be replaced with one saying that the icx is no longer available. In this case you **MUST** use icx. There are two ways to do that shown below.
68
69
69
70
#### 3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)
@@ -72,11 +73,11 @@ Depending on the version of compilers loaded the message shown above might be re
72
73
export I_MPI_CC=icx
73
74
mpiicc -O3 -g -fopenmp ex1.c -o ex_c
74
75
```
75
-
Setting the environmental variable tells mpiicc to use icx (the newer Intel compiler) instead of icc.
76
-
77
-
76
+
Setting the environmental variable tells mpiicc to use icx (the newer Intel compiler) instead of icc.
77
+
78
+
78
79
#### 3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)
Copy file name to clipboardExpand all lines: docs/Documentation/Systems/Kestrel/Environments/gpubuildandrun.md
+98-51Lines changed: 98 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Building and Running on Kestrel's H100 GPU nodes.
2
2
This page describes how to build and run on Kestrel's GPU nodes using several programming paradigms. There are pure Cuda programs, Cuda aware MPI programs, MPI programs without Cuda, MPI programs with Cuda, MPI programs with Openacc, and pure Openacc programs.
3
3
4
+
5
+
4
6
The examples are contained in a tarball available on Kestrel via the command:
where you need to provide your account name. This will run in about 22 minutes using 2 GPU nodes. Some of the examples require 2 nodes but most will run on a single node.
23
25
26
+
## Note about gcc/gfortran
27
+
28
+
Almost all compiling/running on a linux system will at some point reference or in some way use some portion of the GNU (gcc/gfortran/linker) system. Kestrel has many versions of gcc. These fall into three categories:
29
+
30
+
* Native to the Operating system
31
+
* Built by Cray
32
+
* Built by NREL
33
+
34
+
You will also see modules for "mixed" versions. These are just duplicates of others and should not be loaded.
There are a number of "helper" files shipped with the examples. The script *onnodes* is run while you have a job running. You specify the jobid and it will report what is running on each node owned by the job. This will include the core on which each task/thread is running. On GPU nodes it will also report what you have running on each GPU.
@@ -195,7 +238,12 @@ Our script, shown below does the following:
195
238
#SBATCH --output=output-%j.out
196
239
#SBATCH --error=infor-%j.out
197
240
241
+
func () { typeset -f $1 || alias $1; }
242
+
func module > module.$SLURM_JOB_ID
198
243
244
+
# we "overload" srun here to not allow any one subjob has
245
+
# a problem and to runs too long. this should not happen.
246
+
alias srun="/nopt/slurm/current/bin/srun --time=00:05:00 $@"
199
247
if echo $SLURM_SUBMIT_HOST | egrep "kl5|kl6" >> /dev/null ; then : ; else echo Run script from a GPU node; exit ; fi
200
248
# a simple timer
201
249
dt ()
@@ -213,7 +261,7 @@ Our script, shown below does the following:
213
261
cat $0 > script-$SLURM_JOB_ID.out
214
262
215
263
#runs script to put our restore function in our environment
216
-
. /nopt/nrel/apps/env.sh
264
+
. whack.sh
217
265
module_restore
218
266
219
267
#some possible values for gcc module
@@ -239,13 +287,12 @@ Our script, shown below does the following:
239
287
t1=`dt`
240
288
for x in $doits ; do
241
289
dir=`dirname $x`
242
-
echo ++++++++ $dir >&2
243
-
echo ++++++++
290
+
echo ++++++++ $dir | tee >(cat 1>&2)
244
291
echo $dir
245
292
cd $dir
246
293
tbegin=`dt`
247
294
. doit | tee $SLURM_JOB_ID
248
-
echo Runtime `dt $tbegin` $dir `dt $t1` total
295
+
echo Runtime `dt $tbegin` $dir `dt $t1` total | tee >(cat 1>&2)
249
296
cd $startdir
250
297
done
251
298
echo FINISHED `dt $t1`
@@ -254,53 +301,53 @@ Our script, shown below does the following:
254
301
mkdir -p /scratch/$USER/gputest/$SLURM_JOB_ID
255
302
cp *out /scratch/$USER/gputest/$SLURM_JOB_ID
256
303
# . cleanup
257
-
258
-
```
259
-
260
-
261
-
## cuda/cray
262
-
Here we build and run a single GPU code stream.cu. This code is a standard benchmark that measures the floating point performance for a GPU.
263
-
264
-
In this case we are loading PrgEnv-nvhpc/8.4.0 which requires cray-libsci/23.05.1.4. We compile with the "wrapper" compiler CC which, in this case builds with NVIDIA's backend compiler. CC would "pull in" Cray's MPI it it was required.
265
-
266
-
We run on each GPU of each Node in our allocation.
267
-
268
-
??? example "cuda/cray"
269
-
```bash
270
-
: Start from a known module state, the default
271
-
module_restore
272
-
273
-
: Load modules
274
-
#module unload PrgEnv-cray/8.5.0
275
-
#module unload nvhpc/24.1
276
-
277
-
278
-
if [ -z ${MYGCC+x} ]; then module load gcc ; else module load $MYGCC ; fi
279
-
ml PrgEnv-nvhpc/8.4.0
280
-
ml cray-libsci/23.05.1.4
281
-
ml binutils
282
-
: << ++++
283
-
Compile our program
284
-
CC as well as cc, and ftn are wrapper compilers. Because
285
-
we have PrgEnv-nvidia loaded they map to Nvidia's compilers
286
-
but use would use Cray MPI if this was an MPI program.
287
-
Note we can also use nvcc since this is not an MPI program.
288
-
++++
289
-
290
-
rm -rf ./stream.sm_90
291
-
CC -gpu=cc90 -cuda -target-accel=nvidia90 stream.cu -o stream.sm_90
: stream.cu will read the GPU on which to run from the command line
300
-
srun -n 1 --nodes=1 -w $l ./stream.sm_90 -g $GPU
301
-
done
302
-
echo
303
-
done
304
+
305
+
```
306
+
307
+
308
+
## cuda/cray
309
+
Here we build and run a single GPU code stream.cu. This code is a standard benchmark that measures the floating point performance for a GPU.
310
+
311
+
In this case we are loading PrgEnv-nvhpc/8.4.0 which requires cray-libsci/23.05.1.4. We compile with the "wrapper" compiler CC which, in this case builds with NVIDIA's backend compiler. CC would "pull in" Cray's MPI it it was required.
312
+
313
+
We run on each GPU of each Node in our allocation.
314
+
315
+
??? example "cuda/cray"
316
+
```bash
317
+
: Start from a known module state, the default
318
+
module_restore
319
+
320
+
: Load modules
321
+
#module unload PrgEnv-cray/8.5.0
322
+
#module unload nvhpc/24.1
323
+
324
+
325
+
if [ -z ${MYGCC+x} ]; then module load gcc ; else module load $MYGCC ; fi
326
+
ml PrgEnv-nvhpc/8.4.0
327
+
ml cray-libsci/23.05.1.4
328
+
ml binutils
329
+
: << ++++
330
+
Compile our program
331
+
CC as well as cc, and ftn are wrapper compilers. Because
332
+
we have PrgEnv-nvidia loaded they map to Nvidia's compilers
333
+
but use would use Cray MPI if this was an MPI program.
334
+
Note we can also use nvcc since this is not an MPI program.
335
+
++++
336
+
337
+
rm -rf ./stream.sm_90
338
+
CC -gpu=cc90 -cuda -target-accel=nvidia90 stream.cu -o stream.sm_90
0 commit comments