I observed that when I reduced patch_size that my logged test predictions became cropped to the patch_size. I think this is undesirable since we want consistent test metrics reporting regardless of patch_size, i.e. if I have test rasters the metrics should be on the complete raster, rather than a patch. This is addressed by the load_all_crops arg which in my opinion should default to True in testing, but perhaps I have misunderstood the intent? You might argue that the metrics should converge regardless of the patch_size, but in my case the dataset is very small so I do not expect this to be the case.