You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/chat_templating_multimodal.md
+67Lines changed: 67 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,6 +137,73 @@ messages = [
137
137
]
138
138
```
139
139
140
+
### Passing decoded video objects
141
+
In addition to loading videos from a URL or file path, you can also pass decoded video data directly.
142
+
143
+
This is useful if you’ve already preprocessed or decoded video frames elsewhere in memory (e.g., using OpenCV, decord, or torchvision). You don't need to save to files or store it in an URL.
144
+
145
+
- Use the `"video"` type with a dictionary that includes:
146
+
-`"frames"` (`np.ndarray` or `torch.Tensor`):
147
+
A 4D array of shape (num_frames, channels, height, width) containing decoded video frames.
148
+
-`"metadata"` (`"VideoMetadata"` or `"dict"`):
149
+
Describes metadata for the video. If you provide a dictionary, it must include at least one of:
150
+
-`"fps"` (frames per second)
151
+
-`"duration"` (video duration in seconds)
152
+
if both `"fps"` and `"duration"` is provided, `"fps"` gets priority and `"duration"` is calculated based on `"fps"`
"content": [{"type": "text", "text": "You are a friendly chatbot who always responds in the style of a pirate"}],
196
+
},
197
+
{
198
+
"role": "user",
199
+
"content": [
200
+
{"type": "video", "video": video_object2},
201
+
{"type": "text", "text": "What do you see in this video?"}
202
+
],
203
+
},
204
+
]
205
+
```
206
+
140
207
Pass `messages` to [`~ProcessorMixin.apply_chat_template`] to tokenize the input content. There are a few extra parameters to include in [`~ProcessorMixin.apply_chat_template`] that controls the sampling process.
141
208
142
209
The `video_load_backend` parameter refers to a specific framework to load a video. It supports [PyAV](https://pyav.basswood-io.com/docs/stable/), [Decord](https://github.com/dmlc/decord), [OpenCV](https://github.com/opencv/opencv), and [torchvision](https://pytorch.org/vision/stable/index.html).
0 commit comments