I noticed that after training a heatmap_multiview_transformer model, only one video from each view would be processed afterwards.
This is because the find_video_files_for_views function only returns one video for each view, and not necessarily even videos that correspond to the same session.
Additionally, it's not possible to run inference for videos from different views at the same time using litpose predict