You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/chat_templating_multimodal.md
+4-24Lines changed: 4 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -111,6 +111,7 @@ Some vision models also support video inputs. The message format is very similar
111
111
112
112
- The content `"type"` should be `"video"` to indicate the content is a video.
113
113
- For videos, it can be a link to the video (`"url"`) or it could be a file path (`"path"`). Videos loaded from a URL can only be decoded with [PyAV](https://pyav.basswood-io.com/docs/stable/) or [Decord](https://github.com/dmlc/decord).
114
+
- In addition to loading videos from a URL or file path, you can also pass decoded video data directly. This is useful if you’ve already preprocessed or decoded video frames elsewhere in memory (e.g., using OpenCV, decord, or torchvision). You don't need to save to files or store it in an URL.
114
115
115
116
> [!WARNING]
116
117
> Loading a video from `"url"` is only supported by the PyAV or Decord backends.
@@ -137,27 +138,11 @@ messages = [
137
138
]
138
139
```
139
140
140
-
### Passing decoded video objects
141
-
In addition to loading videos from a URL or file path, you can also pass decoded video data directly.
142
-
143
-
This is useful if you’ve already preprocessed or decoded video frames elsewhere in memory (e.g., using OpenCV, decord, or torchvision). You don't need to save to files or store it in an URL.
144
-
145
-
- Use the `"video"` type with a dictionary that includes:
146
-
-`"frames"` (`np.ndarray` or `torch.Tensor`):
147
-
A 4D array of shape (num_frames, channels, height, width) containing decoded video frames.
148
-
-`"metadata"` (`"VideoMetadata"` or `"dict"`):
149
-
Describes metadata for the video. If you provide a dictionary, it must include at least one of:
150
-
-`"fps"` (frames per second)
151
-
-`"duration"` (video duration in seconds)
152
-
if both `"fps"` and `"duration"` is provided, `"fps"` gets priority and `"duration"` is calculated based on `"fps"`
0 commit comments