Squashed commit of the following:

alexeedm · alexeedm · commit 769c4bf6d0ee · 2022-11-08T07:04:28.000-08:00
Updated Readme

    Fixing possible memory leaks and improving code generation for different data types

    Properly pass build target to cmake

    Changed the interface to accept multiple inputs

    Thread-safe pointer map, cosmetic renames
diff --git a/README.md b/README.md
@@ -1,21 +1,21 @@
 # Pytorch Fortran bindings
 
-The goal of this code is provide Fortran HPC codes with a simple way to use Pytorch deep learning framework.
+The goal of this code is to provide Fortran HPC codes with a simple way to use Pytorch deep learning framework.
 We want Fortran developers to take advantage of rich and optimized Torch ecosystem from within their existing codes.
 The code is very much work-in-progress right now and any feedback or bug reports are welcome.
 
 ## Features
 
-*  Define the model convinently in Python, save it and open in Fortran
+*  Define the model conveniently in Python, save it and open in Fortran
 *  Pass Fortran arrays into the model, run inference and get output as a native Fortran array
-*  Train the model from inside Fortran (limit support for now) and save it
+*  Train the model from inside Fortran and save it
 *  Run the model on the CPU or the GPU with the data also coming from the CPU or GPU
 *  Use OpenACC to achieve zero-copy data transfer for the GPU models
 *  Focus on achieving negligible performance overhead
 
 ## Building
 
-To assist with the build, we provide the Docker and [HPCCM](https://github.com/NVIDIA/hpc-container-maker) recipe for the container with all the necessary dependancies installed, see [container](container/)
+To assist with the build, we provide the Docker and [HPCCM](https://github.com/NVIDIA/hpc-container-maker) recipe for the container with all the necessary dependencies installed, see [container](container/)
 
 You'll need to mount a folder with the cloned repository into the container, cd into this folder from the running container and execute `./make_nvhpc.sh`, `./make_gcc.sh` or `./make_intel.sh` depending on the compiler you want to use.
 
@@ -40,4 +40,46 @@ install/bin/python_training  ../examples/python_training/model.py
 
 ## API
 
-We are working on documenting the API, for now please refer to the examples.
+We are working on documenting the full API. Please refer to the examples for more details.
+The bindings are provided through the following Fortran classes:
+
+### Class `torch_tensor`
+This class represents a light-weight Pytorch representation of a Fortran array. It does not own the data and only keeps the respective pointer.
+Supported arrays of ranks up to 7 and datatypes `real32`, `real64`, `int32`, `int64`.
+Members:
+* `from_array(Fortran array or pointer :: array)` : create the tensor representation of a Fortran array.
+* `to_array(pointer :: array)` : create a Fortran pointer from the tensor. This API should be used to convert the returning data of a Pytorch model to the Fortran array.
+
+### Class `torch_tensor_wrap`
+This class wraps a few tensors or scalars that can be passed as input into Pytorch models.
+Arrays and scalars must be of types `real32`, `real64`, `int32` or `int64`.
+Members:
+* `add_scalar(scalar)` : add the scalar value into the wrapper.
+* `add_tensor(torch_tensor :: tensor)` : add the tensor into the wrapper.
+* `add_array(Fortran array or pointe :: array)` : create the tensor representation of a Fortran array and add it into the wrapper.
+
+
+### Class `torch_module`
+This class represents the traced Pytorch model, typically a result of `torch.jit.trace` or `torch.jit.script` call from your Python script. This class in **not thread-safe**. For multi-threaded inference either create a threaded Pytorch model, or use a `torch_module` instance per thread (the latter could be less efficient).
+Members:
+* `load( character(*) :: filename, integer :: flags)` : load the module from a file. Flag can be set to `module_use_device` to enable the GPU processing.
+* `forward(torch_tensor_wrap :: inputs, torch_tensor :: output, integer :: flags)` : run the inference with Pytorch. The tensors and scalars from the `inputs` will be passed into Pytorch and the `output` will contain the result. `flags` is unused now
+* `create_optimizer_sgd(real :: learning_rate)` : create an SGD optimizer to use in the following training 
+* `train(torch_tensor_wrap :: inputs, torch_tensor :: target, real :: loss)` : perform a single training step where `target` is the target result and `loss` is the L2 squared loss returned by the optimizer
+* `save(character(*) :: filename)` : save the trained model
+
+### Class `torch_pymodule`
+This class represents the Pytorch Python script and required the interpreter to be called. Only one `torch_pymodule` can be opened at a time due to the Python interpreter limitation. Overheads calling this class are higher than with `torch_module`, but contrary to the `torch_module%train` one can now train their Pytorch model with any optimizer, dropouts, etc. The intended usage of this class is to run online training with a complex pipeline that cannot be expressed as TorchScript.
+Members:
+* `load( character(*) :: filename)` : load the module from a Python script
+* `forward(torch_tensor_wrap :: inputs, torch_tensor :: output)` : execute `ftn_pytorch_forward` function from the Python script. The function is expected to accept tensors and scalars and returns one tensor. The tensors and scalars from the `inputs` will be passed as argument and the `output` will contain the result.
+* `train(torch_tensor_wrap :: inputs, torch_tensor :: target, real :: loss)` : execute `ftn_pytorch_train` function from the Python script. The function is expected to accept tensors and scalars (with the last argument required to be the target tensor) and returns a tuple of bool `is_completed` and float `loss`. `is_completed` is returned as a result of the `train` function, and `loss` is set accordingly to the Python output. `is_completed` is meant to signify that the training is completed due to any stopping criterion 
+* `save(character(*) :: filename)` : save the trained model
+
+## Changelog
+
+### v0.3
+* Changed interface: `forward` and `train` routines now accept `torch_tensor_wrap` instead of just `torch_tensor`. This allows a user to add multiple inputs consisting of tensors of different size and scalar values
+* Fixed possible small memory leaks due to tensor handles
+* Fixed build targets in the scripts, they now properly build Release versions by default
+* Added a short API help
diff --git a/examples/polynomial/polynomial.f90 b/examples/polynomial/polynomial.f90
@@ -73,8 +73,9 @@ program polynomial
     logical, parameter :: use_gpu = .false.
 #endif
     
-    type(torch_module)    :: torch_mod
-    type(torch_tensor)    :: in_tensor, out_tensor, target_tensor
+    type(torch_module)      :: torch_mod
+    type(torch_tensor)      :: out_tensor, target_tensor
+    type(torch_tensor_wrap) :: in_tensors
 
     real(real32) :: loss
     real(real32), dimension(1, batch_size) :: input, target
@@ -107,11 +108,12 @@ program polynomial
     end if
     call torch_mod%load(in_fname, flag)
     call torch_mod%create_optimizer_sgd(0.1)
+    call in_tensors%create
 
     !$acc data create (input, target) copyin(coeffs)
 
     !$acc host_data use_device(input)
-    call in_tensor%    from_array(input)
+    call in_tensors%add_array(input)
     !$acc end host_data
 
     !$acc host_data use_device(target)
@@ -125,12 +127,12 @@ program polynomial
         !$acc update device(input)
 
         call eval_polynomial(coeffs, input, target)
-        call torch_mod%train(in_tensor, target_tensor, loss)
+        call torch_mod%train(in_tensors, target_tensor, loss)
 
         if (mod(batch_idx, 100) == 0) then
             print "(A,I6,A,F9.6)", "Batch ",batch_idx," loss is ",loss
         end if
-        if (loss < 1e-3) exit
+        if (loss < 1e-4) exit
     end do
 
     if (batch_idx < max_batch_id) then
@@ -145,14 +147,14 @@ program polynomial
     !$acc update device(input)
     call eval_polynomial(coeffs, input, target)
 
-    call torch_mod%forward(in_tensor, out_tensor)
+    call torch_mod%forward(in_tensors, out_tensor)
     call out_tensor%to_array(output)
 
     !$acc update host(target, output)
     loss = sum( (target-output)**2 ) / batch_size
     
     print *, target(1,1:4), output(1,1:4)
-    print "(A,F9.6)", "L2 error of the trained model is ", loss
+    print "(A,F9.6)", "Mean squared error of the trained model is ", loss
 
     !$acc end data
 
diff --git a/examples/python_training/model.py b/examples/python_training/model.py
@@ -25,8 +25,10 @@
 
 def ftn_pytorch_forward(input):
     print('Hello from python')
+    for i in input:
+        print(i)
     return torch.tensor([[1., -1.], [1., -1.]])
 
-def ftn_pytorch_train(input):
+def ftn_pytorch_train(input, target):
     print('train from python')
     return (True, 42.0)
diff --git a/examples/python_training/python_training.f90 b/examples/python_training/python_training.f90
@@ -27,9 +27,10 @@ program python_training
 
     integer :: n
     type(torch_pymodule) :: torch_pymod
-    type(torch_tensor) :: t_in, t_out, t_target
+    type(torch_tensor) :: t_out, t_target
+    type(torch_tensor_wrap) :: tw_in
 
-    real(real32) :: input(224, 224, 3, 10), target(224)
+    real(real32) :: input(2, 3), target(2), factor
     real(real32), pointer :: output(:,:)
     real(real32) :: loss
     logical :: is_completed
@@ -46,18 +47,23 @@ program python_training
     allocate(character(arglen) :: filename)
     call get_command_argument(number=1, value=filename, status=stat)
 
-    input = 1.0
-    call t_in%from_array(input)
+    input(1,:) = 1.0
+    input(2,:) = 2.0
+    factor = 3.0
+
+    call tw_in%create
+    call tw_in%add_array(input)
+    call tw_in%add_scalar(factor)
     call t_target%from_array(target)
 
     call torch_pymod%load(filename)
     ! will call Python function ftn_pytorch_forward(input) -> output
-    call torch_pymod%forward(t_in, t_out)
+    call torch_pymod%forward(tw_in, t_out)
     call t_out%to_array(output)
     print *, output
 
     ! will call Python function ftn_pytorch_train(input, target) -> (is_completed, loss)
-    is_completed = torch_pymod%train(t_in, t_target, loss)
+    is_completed = torch_pymod%train(tw_in, t_target, loss)
     print *, is_completed, loss
 
 end program
diff --git a/examples/resnet_forward/resnet_forward.f90 b/examples/resnet_forward/resnet_forward.f90
@@ -27,7 +27,8 @@ program resnet_forward
 
     integer :: n
     type(torch_module) :: torch_mod
-    type(torch_tensor) :: in_tensor, out_tensor
+    type(torch_tensor_wrap) :: input_tensors
+    type(torch_tensor) :: out_tensor
 
     real(real32) :: input(224, 224, 3, 10)
     real(real32), pointer :: output(:, :)
@@ -45,9 +46,10 @@ program resnet_forward
     call get_command_argument(number=1, value=filename, status=stat)
 
     input = 1.0
-    call in_tensor%from_array(input)
+    call input_tensors%create
+    call input_tensors%add_array(input)
     call torch_mod%load(filename)
-    call torch_mod%forward(in_tensor, out_tensor)
+    call torch_mod%forward(input_tensors, out_tensor)
     call out_tensor%to_array(output)
 
     print *, output(1:5, 1)
diff --git a/make_gnu.sh b/make_gnu.sh
@@ -38,25 +38,25 @@ mkdir -p $BUILD_PATH/build_proxy $BUILD_PATH/build_fortproxy $BUILD_PATH/build_e
 # c++ wrappers 
 (
     cd $BUILD_PATH/build_proxy 
-    cmake -DOPENACC=$OPENACC -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib
+    cmake --build . --parallel
     make install
 )
 
 # fortran bindings
 (
     export PATH=$NVPATH:$PATH 
     cd $BUILD_PATH/build_fortproxy
-    cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/
+    cmake --build . --parallel
     make install
 )
 
 # fortran examples
 (
     export PATH=$NVPATH:$PATH 
     cd $BUILD_PATH/build_example
-    cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH ../../examples/
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=gfortran ../../examples/
+    cmake --build . --parallel
     make install
 )
diff --git a/make_intel.sh b/make_intel.sh
@@ -41,25 +41,25 @@ mkdir -p $BUILD_PATH/build_proxy $BUILD_PATH/build_fortproxy $BUILD_PATH/build_e
 # c++ wrappers 
 (
     cd $BUILD_PATH/build_proxy 
-    cmake -DOPENACC=$OPENACC -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH ../../src/proxy_lib
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH ../../src/proxy_lib
+    cmake --build . --parallel
     make install
 )
 
 # fortran bindings
 (
     export PATH=$NVPATH:$PATH 
     cd $BUILD_PATH/build_fortproxy
-    cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/
+    cmake --build . --parallel
     make install
 )
 
 # fortran examples
 (
     export PATH=$NVPATH:$PATH 
     cd $BUILD_PATH/build_example
-    cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH ../../examples/
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=ifort ../../examples/
+    cmake --build . --parallel
     make install
 )
diff --git a/make_nvhpc.sh b/make_nvhpc.sh
@@ -45,25 +45,25 @@ mkdir -p $BUILD_PATH/build_proxy $BUILD_PATH/build_fortproxy $BUILD_PATH/build_e
 # c++ wrappers 
 (
     cd $BUILD_PATH/build_proxy 
-    cmake -DOPENACC=$OPENACC -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_CXX_COMPILER=g++ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib
+    cmake --build . --parallel
     make install
 )
 
 # fortran bindings
 (
     export PATH=$NVPATH:$PATH 
     cd $BUILD_PATH/build_fortproxy
-    cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=nvfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=nvfortran -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/
+    cmake --build . --parallel
     make install
 )
 
 # fortran examples
 (
     export PATH=$NVPATH:$PATH 
     cd $BUILD_PATH/build_example
-    cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=nvfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH ../../examples/
-    cmake --build . --config $CONFIG --parallel
+    cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=nvfortran ../../examples/
+    cmake --build . --parallel
     make install
 )
diff --git a/src/f90_bindings/gen.py b/src/f90_bindings/gen.py
diff --git a/src/f90_bindings/torch_ftn.f90.templ b/src/f90_bindings/torch_ftn.f90.templ
diff --git a/src/proxy_lib/torch_proxy.cpp b/src/proxy_lib/torch_proxy.cpp
diff --git a/src/proxy_lib/torch_proxy.h b/src/proxy_lib/torch_proxy.h

Original file line number	Diff line number	Diff line change
`@@ -38,25 +38,25 @@ mkdir -p $BUILD_PATH/build_proxy $BUILD_PATH/build_fortproxy $BUILD_PATH/build_e`
`38`	`38`	`# c++ wrappers`
`39`	`39`	`(`
`40`	`40`	`cd $BUILD_PATH/build_proxy`
`41`		`- cmake -DOPENACC=$OPENACC -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib`
`42`		`- cmake --build . --config $CONFIG --parallel`
	`41`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib`
	`42`	`+ cmake --build . --parallel`
`43`	`43`	`make install`
`44`	`44`	`)`
`45`	`45`
`46`	`46`	`# fortran bindings`
`47`	`47`	`(`
`48`	`48`	`export PATH=$NVPATH:$PATH`
`49`	`49`	`cd $BUILD_PATH/build_fortproxy`
`50`		`- cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/`
`51`		`- cmake --build . --config $CONFIG --parallel`
	`50`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/`
	`51`	`+ cmake --build . --parallel`
`52`	`52`	`make install`
`53`	`53`	`)`
`54`	`54`
`55`	`55`	`# fortran examples`
`56`	`56`	`(`
`57`	`57`	`export PATH=$NVPATH:$PATH`
`58`	`58`	`cd $BUILD_PATH/build_example`
`59`		`- cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH ../../examples/`
`60`		`- cmake --build . --config $CONFIG --parallel`
	`59`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=gfortran ../../examples/`
	`60`	`+ cmake --build . --parallel`
`61`	`61`	`make install`
`62`	`62`	`)`
Original file line number	Diff line number	Diff line change
`@@ -41,25 +41,25 @@ mkdir -p $BUILD_PATH/build_proxy $BUILD_PATH/build_fortproxy $BUILD_PATH/build_e`
`41`	`41`	`# c++ wrappers`
`42`	`42`	`(`
`43`	`43`	`cd $BUILD_PATH/build_proxy`
`44`		`- cmake -DOPENACC=$OPENACC -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH ../../src/proxy_lib`
`45`		`- cmake --build . --config $CONFIG --parallel`
	`44`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH ../../src/proxy_lib`
	`45`	`+ cmake --build . --parallel`
`46`	`46`	`make install`
`47`	`47`	`)`
`48`	`48`
`49`	`49`	`# fortran bindings`
`50`	`50`	`(`
`51`	`51`	`export PATH=$NVPATH:$PATH`
`52`	`52`	`cd $BUILD_PATH/build_fortproxy`
`53`		`- cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/`
`54`		`- cmake --build . --config $CONFIG --parallel`
	`53`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/`
	`54`	`+ cmake --build . --parallel`
`55`	`55`	`make install`
`56`	`56`	`)`
`57`	`57`
`58`	`58`	`# fortran examples`
`59`	`59`	`(`
`60`	`60`	`export PATH=$NVPATH:$PATH`
`61`	`61`	`cd $BUILD_PATH/build_example`
`62`		`- cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH ../../examples/`
`63`		`- cmake --build . --config $CONFIG --parallel`
	`62`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=ifort ../../examples/`
	`63`	`+ cmake --build . --parallel`
`64`	`64`	`make install`
`65`	`65`	`)`
Original file line number	Diff line number	Diff line change
`@@ -45,25 +45,25 @@ mkdir -p $BUILD_PATH/build_proxy $BUILD_PATH/build_fortproxy $BUILD_PATH/build_e`
`45`	`45`	`# c++ wrappers`
`46`	`46`	`(`
`47`	`47`	`cd $BUILD_PATH/build_proxy`
`48`		`- cmake -DOPENACC=$OPENACC -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib`
`49`		`- cmake --build . --config $CONFIG --parallel`
	`48`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_CXX_COMPILER=g++ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST ../../src/proxy_lib`
	`49`	`+ cmake --build . --parallel`
`50`	`50`	`make install`
`51`	`51`	`)`
`52`	`52`
`53`	`53`	`# fortran bindings`
`54`	`54`	`(`
`55`	`55`	`export PATH=$NVPATH:$PATH`
`56`	`56`	`cd $BUILD_PATH/build_fortproxy`
`57`		`- cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=nvfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/`
`58`		`- cmake --build . --config $CONFIG --parallel`
	`57`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=nvfortran -DCMAKE_PREFIX_PATH=$INSTALL_PATH/lib ../../src/f90_bindings/`
	`58`	`+ cmake --build . --parallel`
`59`	`59`	`make install`
`60`	`60`	`)`
`61`	`61`
`62`	`62`	`# fortran examples`
`63`	`63`	`(`
`64`	`64`	`export PATH=$NVPATH:$PATH`
`65`	`65`	`cd $BUILD_PATH/build_example`
`66`		`- cmake -DOPENACC=$OPENACC -DCMAKE_Fortran_COMPILER=nvfortran -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH ../../examples/`
`67`		`- cmake --build . --config $CONFIG --parallel`
	`66`	`+ cmake -DOPENACC=$OPENACC -DCMAKE_BUILD_TYPE=$CONFIG -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_Fortran_COMPILER=nvfortran ../../examples/`
	`67`	`+ cmake --build . --parallel`
`68`	`68`	`make install`
`69`	`69`	`)`