Skip to content

Format videos to poseinterface spec. Extract clip function. #39

Open
sfmig wants to merge 15 commits intomainfrom
video-utils
Open

Format videos to poseinterface spec. Extract clip function. #39
sfmig wants to merge 15 commits intomainfrom
video-utils

Conversation

@sfmig
Copy link
Copy Markdown
Member

@sfmig sfmig commented Mar 27, 2026

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

What does this PR do?

  • Added video_to_poseinterface to io module, to convert a video to poseinterface format.
    • It renames videos following spec and reencodes them if required.
  • Basic util to extract a clip and the corresponding cliplabels.json file, given a video in poseinterface format, its full video annotations in cliplabels.json format and a range of frames.
    • Exposed as entry point extract-clip

References

\

How has this PR been tested?

Tests pass locally and in CI.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

Yes, docstrings.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@sfmig sfmig changed the base branch from main to auto-file-name March 27, 2026 13:57
Comment thread poseinterface/clips.py

# Slice clip and save as mp4
clip = video[start_frame : start_frame + duration]
clip_path = (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be video_path.stem not video.stem, right?

    clip_path = (
        clips_dir / f"{video_path.stem}_start-{start_frame}_dur-{duration}.mp4"
    )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are equivalent, video.stem gets the filename from the sleap-io Video object

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these are exactly equivalent. While trying this PR on some real data, I encountered the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 28
     25 session_dir = benchmark_base_dir / split / project_name / sub_ses_prefix
     27 for start_frame in start_frames:
---> 28     clip_path, clip_json = extract_clip(
     29         video_path=(session_dir / f"{sub_ses_cam_prefix}.mp4"),
     30         start_frame=start_frame,
     31         duration=duration,
     32     )
     33     print(f"Extracted clip: {clip_path}, {clip_json}")

File ~/Code/NIU/poseinterface/poseinterface/clips.py:85, in extract_clip(video_path, start_frame, duration)
     82 # Slice clip and save as mp4
     83 clip = video[start_frame : start_frame + duration]
     84 clip_path = (
---> 85     clips_dir / f"{video.stem}_start-{start_frame}_dur-{duration}.mp4"
     86 )
     87 sio.save_video(clip, clip_path, fps=video.fps)
     89 # Generate cliplabels.json from the full video labels

AttributeError: 'Video' object has no attribute 'stem'

The error goes away when using my version of clip_path (from my previous comment).

@sfmig sfmig changed the base branch from auto-file-name to main April 1, 2026 13:58
@sfmig sfmig force-pushed the video-utils branch 2 times, most recently from 747985c to 3ade24a Compare April 1, 2026 14:24
@sfmig sfmig marked this pull request as ready for review April 1, 2026 14:51
@sfmig sfmig requested a review from niksirbi April 1, 2026 14:51
Copy link
Copy Markdown
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sfmig.

I tried using the new functions in #40. They mostly worked, but I've stumbled on a few issues that need to be addressed before merging (see inline comments). Happy to do another round of review after we resolve these (I skipped the tests in this round).

By the way, if we want the new public functions to appear in the API references, we have to add them manually in api_index.rst (we haven't yet set up the automatic machinery we have in movement).

Comment thread poseinterface/io.py
REENCODING_PARAMS = {
**EXPECTED_ENCODING,
"codec": "libx264", # overwrite with encoder to use
"crf": 25,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLEAP's (and therefore OCTRON's) magic incantation uses crf 23. Any specific reason for going with 25 here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releatedly, I wonder whether we should expose crf to the user as an optional kwarg in video_to_poseinterface

Comment thread poseinterface/io.py
@@ -1,10 +1,15 @@
"""Functions to convert annotations and videos to PoseInterface format."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere else we style package's name as lowercase, usually monospace a la movement. I recommend we don't go into CamelCase, unless we make an explicity decision to do so project-wide.

Suggested change
"""Functions to convert annotations and videos to PoseInterface format."""
"""Functions to convert annotations and videos to ``poseinterface`` format."""

Comment thread poseinterface/io.py
if encoding != EXPECTED_ENCODING:
logging.warning(
f"Video encoding {encoding} does not match "
f"expected {EXPECTED_ENCODING}. Please reencode "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since re-encoding happens automatically if needed. Should we reframe this as 'Will reencode' instead of 'Please reencode'?

Also since this is actually an expected action (documented in the docstring of video_to_poseinterface), I wonder whether this should be an INFO instead of WARNING.

Comment thread poseinterface/clips.py
@@ -0,0 +1,185 @@
"""Functions to extract clips from poseinterface videos."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Functions to extract clips from poseinterface videos."""
"""Functions to extract clips from ``poseinterface`` videos."""

Comment thread poseinterface/clips.py
return clip_path, clip_json


def _extract_cliplabels(video_path, clips_dir, start_frame, duration):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting our spec on cliplabels.json:

  • Clip labels follow the same COCO keypoints format as frame labels, but with different conventions for image id and file_name values:
  • Each image id must be the 0-based index of the frame within the clip (i.e. 0, 1, 2, ...), not the index in the session video.
  • Each file_name must follow the same pattern as frame image filenames, but without the extension. The frame field in the file_name must correspond to the index of that frame in the session video.

This means that each entry in the images array encodes two pieces of information: the id gives the local position within the clip, while the frame field in file_name gives the global position in the session video. Note that in both cases the indices are 0-based.

For a clip starting at frame 1000 with a duration of 5 frames, the images array would be:

[
{"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028},
{"id": 1, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1001", "width": 1300, "height": 1028},
{"id": 2, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1002", "width": 1300, "height": 1028},
{"id": 3, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1003", "width": 1300, "height": 1028},
{"id": 4, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1004", "width": 1300, "height": 1028}
]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function correctly selects the images corresponding to clip, but there is a step missing: the IDs of the extracted images must be changed to start with 0 within the extracted clip (i.e. subtract start_frame). The file_names should be left as they are, to keep a reference to the global frame index.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar problem applies to the annotation ids inside the extracted cliplabels file. For a clip starting at frame 1000, the first annotation entry returned by the current implementation is as follows:

 "annotations": [
    {
      "id": 1001,
      "image_id": 1000,
      "category_id": 1,
      "keypoints": [529.621887207031, 494.971038818359, 2, 543.039184570313, 501.402648925781, 2, 544.258728027344, 482.781982421875, 2, 599.23681640625, 496.673095703125, 2, 593.133361816406, 527.087524414063, 2, 604.429321289063, 470.6630859375, 2, 669.785888671875, 507.377227783203, 2, 673.556396484375, 613.862365722656, 2],
      "num_keypoints": 8,
      "bbox": [529.621887207031, 470.6630859375, 143.934509277344, 143.199279785156],
      "area": 20611.3180647455,
      "iscrowd": 0
    },

Since our spec recommends that annotation IDs are 1-indexed, the "id" here should probably be 1, with the "image_id" being 0, as per my previous comment.

Comment thread poseinterface/clips.py

# Slice clip and save as mp4
clip = video[start_frame : start_frame + duration]
clip_path = (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these are exactly equivalent. While trying this PR on some real data, I encountered the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 28
     25 session_dir = benchmark_base_dir / split / project_name / sub_ses_prefix
     27 for start_frame in start_frames:
---> 28     clip_path, clip_json = extract_clip(
     29         video_path=(session_dir / f"{sub_ses_cam_prefix}.mp4"),
     30         start_frame=start_frame,
     31         duration=duration,
     32     )
     33     print(f"Extracted clip: {clip_path}, {clip_json}")

File ~/Code/NIU/poseinterface/poseinterface/clips.py:85, in extract_clip(video_path, start_frame, duration)
     82 # Slice clip and save as mp4
     83 clip = video[start_frame : start_frame + duration]
     84 clip_path = (
---> 85     clips_dir / f"{video.stem}_start-{start_frame}_dur-{duration}.mp4"
     86 )
     87 sio.save_video(clip, clip_path, fps=video.fps)
     89 # Generate cliplabels.json from the full video labels

AttributeError: 'Video' object has no attribute 'stem'

The error goes away when using my version of clip_path (from my previous comment).

Comment thread poseinterface/clips.py
Comment on lines +90 to +92
clip_json = _extract_cliplabels(
video_path, clips_dir, start_frame, duration
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this step should be optional, as in, only extract cliplabels if a corresponding (appropriately named) .json file is to be found in the same folder as the video, otherwise just do the video clip. This will make extract_clip broadly useful in all sorts of contexts (beyond the specific purpose of generating clips for poseinterface benchmarks).

As things are, you can't really use extract_clip, unless you have the companion .json file.

Comment thread poseinterface/clips.py
def _extract_cliplabels(video_path, clips_dir, start_frame, duration):
"""Extract clip labels from the video cliplabels.json file."""
# Read file with labels for the whole video
video_json = video_path.parent / f"{video_path.stem}_cliplabels.json"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no longer certain about the suffix of this json file, see this comment #10 (comment) and the discussion in the PR review for #45.

Comment thread poseinterface/clips.py
video_path, clips_dir, start_frame, duration
)

return clip_path, clip_json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to log an INFO message about this function's success before returning, to signal that it has actually worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants