Implement the surface flow component in Taichi to compare with the existing Cython implementation.
Potential upside:
Potential downside of GPU implementation is additional memory transfer from/to the accelerator from/to the RAM, especially in case of variable input data (typically rainfall), saves to disk, and to other non-GPU components (hydrology, swmm, etc.).
Unified memory architectures like Apple Silicon might be at advantage here.