Format videos to poseinterface spec. Extract clip function. #39
Format videos to poseinterface spec. Extract clip function. #39
poseinterface spec. Extract clip function. #39Conversation
|
|
||
| # Slice clip and save as mp4 | ||
| clip = video[start_frame : start_frame + duration] | ||
| clip_path = ( |
There was a problem hiding this comment.
This should be video_path.stem not video.stem, right?
clip_path = (
clips_dir / f"{video_path.stem}_start-{start_frame}_dur-{duration}.mp4"
)
There was a problem hiding this comment.
both are equivalent, video.stem gets the filename from the sleap-io Video object
There was a problem hiding this comment.
Not sure these are exactly equivalent. While trying this PR on some real data, I encountered the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[10], line 28
25 session_dir = benchmark_base_dir / split / project_name / sub_ses_prefix
27 for start_frame in start_frames:
---> 28 clip_path, clip_json = extract_clip(
29 video_path=(session_dir / f"{sub_ses_cam_prefix}.mp4"),
30 start_frame=start_frame,
31 duration=duration,
32 )
33 print(f"Extracted clip: {clip_path}, {clip_json}")
File ~/Code/NIU/poseinterface/poseinterface/clips.py:85, in extract_clip(video_path, start_frame, duration)
82 # Slice clip and save as mp4
83 clip = video[start_frame : start_frame + duration]
84 clip_path = (
---> 85 clips_dir / f"{video.stem}_start-{start_frame}_dur-{duration}.mp4"
86 )
87 sio.save_video(clip, clip_path, fps=video.fps)
89 # Generate cliplabels.json from the full video labels
AttributeError: 'Video' object has no attribute 'stem'The error goes away when using my version of clip_path (from my previous comment).
747985c to
3ade24a
Compare
niksirbi
left a comment
There was a problem hiding this comment.
Thanks @sfmig.
I tried using the new functions in #40. They mostly worked, but I've stumbled on a few issues that need to be addressed before merging (see inline comments). Happy to do another round of review after we resolve these (I skipped the tests in this round).
By the way, if we want the new public functions to appear in the API references, we have to add them manually in api_index.rst (we haven't yet set up the automatic machinery we have in movement).
| REENCODING_PARAMS = { | ||
| **EXPECTED_ENCODING, | ||
| "codec": "libx264", # overwrite with encoder to use | ||
| "crf": 25, |
There was a problem hiding this comment.
SLEAP's (and therefore OCTRON's) magic incantation uses crf 23. Any specific reason for going with 25 here?
There was a problem hiding this comment.
Releatedly, I wonder whether we should expose crf to the user as an optional kwarg in video_to_poseinterface
| @@ -1,10 +1,15 @@ | |||
| """Functions to convert annotations and videos to PoseInterface format.""" | |||
There was a problem hiding this comment.
Everywhere else we style package's name as lowercase, usually monospace a la movement. I recommend we don't go into CamelCase, unless we make an explicity decision to do so project-wide.
| """Functions to convert annotations and videos to PoseInterface format.""" | |
| """Functions to convert annotations and videos to ``poseinterface`` format.""" |
| if encoding != EXPECTED_ENCODING: | ||
| logging.warning( | ||
| f"Video encoding {encoding} does not match " | ||
| f"expected {EXPECTED_ENCODING}. Please reencode " |
There was a problem hiding this comment.
Since re-encoding happens automatically if needed. Should we reframe this as 'Will reencode' instead of 'Please reencode'?
Also since this is actually an expected action (documented in the docstring of video_to_poseinterface), I wonder whether this should be an INFO instead of WARNING.
| @@ -0,0 +1,185 @@ | |||
| """Functions to extract clips from poseinterface videos.""" | |||
There was a problem hiding this comment.
| """Functions to extract clips from poseinterface videos.""" | |
| """Functions to extract clips from ``poseinterface`` videos.""" |
| return clip_path, clip_json | ||
|
|
||
|
|
||
| def _extract_cliplabels(video_path, clips_dir, start_frame, duration): |
There was a problem hiding this comment.
Quoting our spec on cliplabels.json:
- Clip labels follow the same COCO keypoints format as frame labels, but with different conventions for image
idandfile_namevalues:- Each image
idmust be the 0-based index of the frame within the clip (i.e.0,1,2, ...), not the index in the session video.- Each
file_namemust follow the same pattern as frame image filenames, but without the extension. Theframefield in thefile_namemust correspond to the index of that frame in the session video.This means that each entry in the
imagesarray encodes two pieces of information: theidgives the local position within the clip, while theframefield infile_namegives the global position in the session video. Note that in both cases the indices are 0-based.For a clip starting at frame 1000 with a duration of 5 frames, the
imagesarray would be:[ {"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028}, {"id": 1, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1001", "width": 1300, "height": 1028}, {"id": 2, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1002", "width": 1300, "height": 1028}, {"id": 3, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1003", "width": 1300, "height": 1028}, {"id": 4, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1004", "width": 1300, "height": 1028} ]
There was a problem hiding this comment.
This function correctly selects the images corresponding to clip, but there is a step missing: the IDs of the extracted images must be changed to start with 0 within the extracted clip (i.e. subtract start_frame). The file_names should be left as they are, to keep a reference to the global frame index.
There was a problem hiding this comment.
A similar problem applies to the annotation ids inside the extracted cliplabels file. For a clip starting at frame 1000, the first annotation entry returned by the current implementation is as follows:
"annotations": [
{
"id": 1001,
"image_id": 1000,
"category_id": 1,
"keypoints": [529.621887207031, 494.971038818359, 2, 543.039184570313, 501.402648925781, 2, 544.258728027344, 482.781982421875, 2, 599.23681640625, 496.673095703125, 2, 593.133361816406, 527.087524414063, 2, 604.429321289063, 470.6630859375, 2, 669.785888671875, 507.377227783203, 2, 673.556396484375, 613.862365722656, 2],
"num_keypoints": 8,
"bbox": [529.621887207031, 470.6630859375, 143.934509277344, 143.199279785156],
"area": 20611.3180647455,
"iscrowd": 0
},Since our spec recommends that annotation IDs are 1-indexed, the "id" here should probably be 1, with the "image_id" being 0, as per my previous comment.
|
|
||
| # Slice clip and save as mp4 | ||
| clip = video[start_frame : start_frame + duration] | ||
| clip_path = ( |
There was a problem hiding this comment.
Not sure these are exactly equivalent. While trying this PR on some real data, I encountered the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[10], line 28
25 session_dir = benchmark_base_dir / split / project_name / sub_ses_prefix
27 for start_frame in start_frames:
---> 28 clip_path, clip_json = extract_clip(
29 video_path=(session_dir / f"{sub_ses_cam_prefix}.mp4"),
30 start_frame=start_frame,
31 duration=duration,
32 )
33 print(f"Extracted clip: {clip_path}, {clip_json}")
File ~/Code/NIU/poseinterface/poseinterface/clips.py:85, in extract_clip(video_path, start_frame, duration)
82 # Slice clip and save as mp4
83 clip = video[start_frame : start_frame + duration]
84 clip_path = (
---> 85 clips_dir / f"{video.stem}_start-{start_frame}_dur-{duration}.mp4"
86 )
87 sio.save_video(clip, clip_path, fps=video.fps)
89 # Generate cliplabels.json from the full video labels
AttributeError: 'Video' object has no attribute 'stem'The error goes away when using my version of clip_path (from my previous comment).
| clip_json = _extract_cliplabels( | ||
| video_path, clips_dir, start_frame, duration | ||
| ) |
There was a problem hiding this comment.
I think this step should be optional, as in, only extract cliplabels if a corresponding (appropriately named) .json file is to be found in the same folder as the video, otherwise just do the video clip. This will make extract_clip broadly useful in all sorts of contexts (beyond the specific purpose of generating clips for poseinterface benchmarks).
As things are, you can't really use extract_clip, unless you have the companion .json file.
| def _extract_cliplabels(video_path, clips_dir, start_frame, duration): | ||
| """Extract clip labels from the video cliplabels.json file.""" | ||
| # Read file with labels for the whole video | ||
| video_json = video_path.parent / f"{video_path.stem}_cliplabels.json" |
There was a problem hiding this comment.
I'm no longer certain about the suffix of this json file, see this comment #10 (comment) and the discussion in the PR review for #45.
| video_path, clips_dir, start_frame, duration | ||
| ) | ||
|
|
||
| return clip_path, clip_json |
There was a problem hiding this comment.
Would be nice to log an INFO message about this function's success before returning, to signal that it has actually worked.
Description
What is this PR
What does this PR do?
video_to_poseinterfaceto io module, to convert a video toposeinterfaceformat.cliplabels.jsonfile, given a video inposeinterfaceformat, its full video annotations incliplabels.jsonformat and a range of frames.extract-clipReferences
\
How has this PR been tested?
Tests pass locally and in CI.
Is this a breaking change?
No.
Does this PR require an update to the documentation?
Yes, docstrings.
Checklist: