Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions dev/parm/config/gfs/config.resources.GAEAC6
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ case ${step} in
"fcst" | "efcs")
case "${CASE}" in
"C768")
export tasks_per_node=144
export tasks_per_node=180
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeorgeVandenberghe-NOAA had noted that the UFS did not run properly on C6 when allowed to use more than 144 tasks at high resolution. Is this issue fixed?

;;
"C1152")
export tasks_per_node=144
export tasks_per_node=160
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone share test logs for this change? I know after the OS update we still have to make manual changes to the number of write grid tasks for gaea specifically, so we'd need to confirm this is working before updating as this will impact GFS retro tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not run C1152 on Ursa, which exceed the maximum numbers of nodes I can require.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changing the layout on gaea. We shouldn’t change the layout without testing and showing that this will work.

;;
*)
# Nothing to do for other resolutions
Expand Down
2 changes: 1 addition & 1 deletion dev/parm/config/gfs/config.resources.URSA
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ case ${step} in
"fcst" | "efcs")
case "${CASE}" in
"C768")
export tasks_per_node=144
export tasks_per_node=180
;;
"C1152")
export tasks_per_node=144
Expand Down
2 changes: 1 addition & 1 deletion dev/parm/config/gfs/config.ufs
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ case "${fv3_res}" in
export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE=15 #Note this should be 10 for WCOSS2
elif [[ "${RUN}" = "gfs" || "${RUN}" = "gcafs" ]]; then
export layout_x=12
export layout_y=16
export layout_y=15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in combination with the config.resources.<host> changes so that the tiles land on the same nodes? Might want inputs from @dpsarmie on this change as this will affect all hosts. This would also need to be tested on WCOSS2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that some justification would be needed to have this change (on top of thorough testing since this seems to have only been tested on Ursa and Gaea).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

768/15 = 51.2 so I also think this means we can't be guaranteed reproducibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidHuber-NOAA Yes, that is the thought, so we hope that more communications can be within node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JessicaMeixner-NOAA From my understanding 768/15 = 51.2, which is not an integer, which should impact load balance, but not reproducibility. But with C768 is a pretty high-resolution, there are should be plenty of tasks to do, load-balance is probably not as important as at low-res. I guess reduce cross-node communication is more important.
From my understanding task per node impacts memory per task can use, and layout will mostly impact load balance (and probably cross cube-spheric tiles communication).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpsarmie can you weigh in here. It’s my understanding the layout chosen here will not reproduce with other layouts since it’s not divisible. Maybe things have changed?

export WRITE_GROUP=4
export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE=20 #Note this should be 10 for WCOSS2
fi
Expand Down