The goal of this project was to detect the move sequence of a boulder using bouldering videos.
We were provided a dataset of videos. In this dataset, we first selected good videos and stabilised them. We then labeled all the usable videos for further preproccessing like cropping, cutting or taking a screenshot for visualization. Afterward, we ran a pose estimation algorithm in order to get the coordinates of the body parts of the climbers. We used this data to detect the moves of the boulder problem and we found the sequence using a clustering algorithm. Finally, we implemented some visualization functions to show our results.
We were pretty impressed with our results, when the climber is fully in the frame, the program outputs the right move sequence.
Link to more videos to test the program: https://drive.google.com/drive/folders/1S0hvjk2Zq7UDENHD-uOba7GCrgQQlS4s?usp=sharing
To install the following dependencies, you can run pip3 install -r requirements.txt :
python=~3.8numpypandasffmpegwithvidstabto stabilize the videos.mediapipeto estimate the climbers' poses.imageiofor generating the GIF visualization.gspreadfor the google drive integrationopencv-pythonfor the visualization.openpyxlfor the xlsx import.scikit-learnto compute the clusters for the holds.
The file structure to follow for the project to work out of the box is the following :
ml-project-2-bouldering_team1
│ boulder_problems.xlsx
| colab_notebook.ipynb
| move_sequence.py
| pose_estimation.py
│ README.md
| run.py
| utils.py
│
└───preprocessing
│ │ cropping.py
│ │ screengrab.py
│ │ stabilization.py
│
└───videos
|
|
└───boulder_1_01
| |
| |
| └───fail
| | | [vid_file]
| | | [vid_file]
| | | ...
| |
| |
| └───success
| | [vid_file]
| | [vid_file]
| | ...
|
└───boulder_1_02
| |
| |
| └───fail
| | | [vid_file]
| | | [vid_file]
| | | ...
| |
| |
| └───success
| | [vid_file]
| | [vid_file]
| | ...
└───...
-
Stabilization is done using FFMPEG in conjunction with vidstab, we first compute the transformation of the camera with the following command :
ffmpeg -hide_banner -loglevel error -i f'{[vid_file]}' -vf f'vidstabdetect=result={[vid_file]}.trf' -f null -
Afterwards we can stabilize the video with :
ffmpeg -hide_banner -loglevel error -i f'{[vid_file]}' -vf f'vidstabtransform=input={[vid_file]}.trf:smoothing=0' f'{[vid_file]}_STAB.MOV' -
Cropping in time, we have to define the time interval we want to keep for each video in the excel sheet.
-
Cropping in space, we have to define part of the video we want to keep for each video in the excel sheet.
-
Screenshots, these will help us with the visualization later on. The time to take the screenshot can be defined in the excel sheet, otherwise it will be the last frame of the video.
We used mediapipe for pose estimation as it works well and ran pretty fast during our testing.
For the computing the move sequence, we first check which extremity has not moved for more than a certain threshold during a short period of time, this allows us to roughly determine the coordinates of the holds. After that we run a clustering algorithm on this point cloud in order to get the centroids and to get a better idea of the holds' locations (here, by holds, we mean where the climber is either holding a hold or taking support on the wall).
--table_paththe path of the excel file, default isboulder_problems.--n_sheetnumber of sheets to process in the excel file, default is 1.--paththe path of the climbing videos (the top folder, in our structure it would bevideos/).--n_boulders, with folders of the formatboulder_i_*invideos/, set the max i to explore for the stabilization, default is 7.--vid_listoptional list of videos to stabilize, default isNone.--no_prepskips all the preprocessing steps (stabilization, cropping and screengrabing).--stabto stabilize all the unstabilized videos.--cropto crop the time of the videos (according to the numbers in the excel sheet).--screento generate the screenshot for each of the videos.--poseto estimate the pose using mediapipe and generate json files containing the keypoints coordinates.--output_videoto output the videos with the pose estimation.--moveto generate the move sequence using the json files of the pose estimation.--gifto save the move sequence as a GIF.--normal_screensto grab screenshots on the non mediapipe videos, defaults to False.--redo_screensto rerun the screengrabing on all videos.--redo_movesto rerun the move sequence computations.--verbose,-vto set the verbose level,-vfor infos and-vvfor debugging.
After running everything, for each video we will end up having :
[vid_file][vid_file].trf[vid_file]_STAB.MOV[vid_file]_STAB.MOV_SCREEN.jpg[vid_file]_STAB.MOV_POSE.json[vid_file]_STAB.MOV_POSE.mp4[vid_file]_STAB.MOV_MOVE_SEQ.jpg[vid_file]_STAB.MOV_MOVE_SEQ.gif
The command python3 -m run --stab --crop --screen --pose --move gives the following results :
And the command python3 -m run --no_prep --move --gif, ran with a preprocessed dataset, yields the following kind of GIF :

