@@ -4,29 +4,27 @@ alwaysApply: true
4
4
---
5
5
6
6
# 0 Purpose & Scope
7
- Consolidated guidance for the MFC exascale, many-physics solver.
8
- Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when
9
- `.fpp` / `.f90` files are in view.
7
+ Consolidated guidance for the MFC exascale, many-physics solver.
8
+ Written primarily for Fortran/Fypp; the GPU and style sections matter only when `.fpp` / `.f90` files are in view.
10
9
11
10
---
12
11
13
12
# 1 Global Project Context (always)
14
- - **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
15
- - Sources `src/`, tests `tests/`, examples `examples/`.
16
- - Most sources are `.fpp`; CMake transpiles them to `.f90`.
17
- - **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
18
- `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
19
- - Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC**.
20
- - Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
21
- intrinsics.
22
- - Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
23
- file-level `include` files.
24
- - **Read the full codebase and docs *before* changing code.**
25
- Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.
13
+ - **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
14
+ - Sources `src/`, tests `tests/`, examples `examples/`.
15
+ - Most sources are `.fpp`; CMake transpiles them to `.f90`.
16
+ - **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
17
+ `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
18
+ - Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC** or **OpenMP**.
19
+ - Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern intrinsics.
20
+ - Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and file-level `include` files.
21
+ - **Read the full codebase and docs *before* changing code.**
22
+ - Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the repository root `README.md`.
26
23
27
24
### Incremental-change workflow
28
- 1. Draft a step-by-step plan.
29
- 2. After each step, build:
25
+
26
+ 1. Draft a step-by-step plan.
27
+ 2. After each step, build:
30
28
```bash
31
29
./mfc.sh build -t pre_process simulation -j $(nproc)
32
30
```
@@ -49,34 +47,131 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
49
47
* Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
50
48
* Function → `f_<verb>_<noun>`
51
49
* Private helpers stay in the module; avoid nested procedures.
52
- * **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
53
- module/file ≤ 1000.
54
- * ≤ 6 arguments per routine; otherwise pass a derived-type “params” struct.
50
+ * **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000.
51
+ * ≤ 6 arguments per routine; otherwise pass a derived-type "params" struct.
55
52
* No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
56
- * Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
57
- / `pointer`.
53
+ * Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable` / `pointer`.
58
54
* Use `s_mpi_abort(<msg>)` for errors, not `stop`.
59
- * Mark OpenACC -callable helpers that are called from OpenACC parallel loops immediately after declaration:
55
+ * Mark GPU -callable helpers that are called from GPU parallel loops immediately after declaration:
60
56
```fortran
61
57
subroutine s_flux_update(...)
62
- !$acc routine seq
58
+ $:GPU_ROUTINE(function_name='s_flux_update', parallelism='[ seq]')
63
59
...
64
60
end subroutine
65
61
```
66
62
67
63
---
68
64
69
- # 3 OpenACC Programming Guidelines (for kernels)
65
+ # 3 File & Module Structure
70
66
71
- Wrap tight loops with
67
+ - **File Naming**:
68
+ - `.fpp` files: Fypp preprocessed files that get translated to `.f90`
69
+ - Modules are named with `m_` prefix followed by feature name: `m_helper_basic`, `m_viscous`
70
+ - Primary program file is named `p_main.fpp`
71
+
72
+ - **Module Layout**:
73
+ - Start with Fypp include for macros: `#:include 'macros.fpp'`
74
+ - Header comments using `!>` style documentation
75
+ - `module` declaration with name matching filename
76
+ - `use` statements for dependencies
77
+ - `implicit none` statement
78
+ - `private` declaration followed by explicit `public` exports
79
+ - `contains` section
80
+ - Implementation of subroutines and functions
81
+
82
+ ---
83
+
84
+ # 4 Fypp Macros
85
+
86
+ - **Fypp Directives**:
87
+ - Start with `#:` (e.g., `#:include`, `#:def`, `#:enddef`)
88
+ - Macros defined in `include/*.fpp` files
89
+ - Used for code generation, conditional compilation, and GPU offloading
90
+
91
+ ---
72
92
93
+ # 5 FYPP Macros for GPU Acceleration Programming Guidelines (for GPU kernels)
94
+
95
+ - Do not use OpenACC or OpenMP directives directly.
96
+ - Instead, use the FYPP macros contained in `src/common/include/parallel_macros.fpp`
97
+ - Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html
98
+
99
+ Wrap tight loops with
73
100
```fortran
74
- !$acc parallel loop gang vector default(present) reduction( ...)
101
+ $:GPU_PARALLEL_FOR(private='[...]', copy='[ ...]' )
75
102
```
76
- * Add `collapse(n) ` to merge nested loops when safe.
77
- * Declare loop-local variables with `private( ...) `.
103
+ * Add `collapse=n ` to merge nested loops when safe.
104
+ * Declare loop-local variables with `private='[ ...]' `.
78
105
* Allocate large arrays with `managed` or move them into a persistent
79
- `!$acc enter data ` region at start-up.
106
+ `$:GPU_ENTER_DATA(...) ` region at start-up.
80
107
* **Do not** place `stop` / `error stop` inside device code.
81
- * Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
108
+ * Must compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
82
109
GNU `gfortran` and Intel `ifx`/`ifort`.
110
+
111
+ - Example GPU macros include the below, among others:
112
+ - `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
113
+ - `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
114
+ - `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
115
+ - `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
116
+ - `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
117
+ - `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
118
+
119
+ ---
120
+
121
+ # 6 Documentation Style
122
+
123
+ - **Subroutine/Function Documentation**:
124
+ ```fortran
125
+ !> This procedure <description>
126
+ !! @param param_name Description of the parameter
127
+ !! @return Description of the return value (for functions)
128
+ ```
129
+ which conforms to the Doxygen Fortran format.
130
+
131
+ # 7 Error Handling
132
+
133
+ - **Assertions**:
134
+ - Use the fypp `ASSERT` macro for validating conditions
135
+ - Example: `@:ASSERT(predicate, message)`
136
+
137
+ - **Error Reporting**:
138
+ - Use `s_mpi_abort(error_message)` for error termination, not `stop`
139
+ - No `stop` / `error stop` inside device code
140
+
141
+ # 8 Memory Management
142
+
143
+ - **Allocation/Deallocation**:
144
+ - Use fypp macro `@:ALLOCATE(var1, var2)` macro for device-aware allocation
145
+ - Use fypp macro `@:DEALLOCATE(var1, var2)` macro for device-aware deallocation
146
+
147
+ # 9. Additional Observed Patterns
148
+
149
+ - **Derived Types**:
150
+ - Extensive use of derived types for encapsulation
151
+ - Use pointers within derived types (e.g., `pointer, dimension(:,:,:) => null()`)
152
+ - Clear documentation of derived type components
153
+
154
+ - **Pure & Elemental Functions**:
155
+ - Use `pure` and `elemental` attributes for side-effect-free functions
156
+ - Combine them for operations on arrays (`pure elemental function`)
157
+
158
+ - **Precision Handling**:
159
+ - Use `wp` (working precision) parameter from `m_precision_select`
160
+ - Never hardcode precision with `real*8` or similar
161
+
162
+ - **Loop Optimization**:
163
+ - Favor array operations over explicit loops when possible
164
+ - Use `collapse=N` directive to optimize nested loops
165
+
166
+ # 10. Fortran Practices to Avoid
167
+
168
+ - **Fixed Format**: Only free-form Fortran is used
169
+ - No column-position dependent code
170
+
171
+ - **Older Intrinsics**: Avoid outdated Fortran features like:
172
+ - `equivalence` statements
173
+ - `data` statements (use initialization expressions)
174
+ - Character*N (use `character(len=N)` instead)
175
+
176
+ - **Using same variable for multiple purposes**: Maintain single responsibility
177
+ - Each variable should have one clear purpose
0 commit comments