Skip to content

Handling Variable Mel Spectrogram Shapes Due to Varying num_frames #1

@Sushiman31

Description

@Sushiman31

I've been looking at the function extract_mfcc_from_audio() in feature_extraction_pipline.py, and I noticed that the number of frames (num_frames) is extracted dynamically from the corresponding video file using ffmpeg.probe(). Since different videos might have different durations, this means the extracted Mel spectrograms could have different shapes (i.e., different time steps).

I have a few questions about how you handle this variability:

Do all videos in your dataset have the same duration, ensuring a consistent num_frames?

If not, how do you handle the case where the extracted Mel spectrograms have different shapes?

   Do you apply padding or truncation later in the pipeline?

   Does your model support variable-length inputs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions