vllm.multimodal.utils ¶
Functions:
-
argsort_mm_positions–Given a
MultiModalPlaceholders, output a sequence of keys to -
encode_audio_base64–Encode audio as base64.
-
encode_audio_url–Encode audio as a data URL.
-
encode_image_base64–Encode a pillow image to base64 format.
-
encode_image_url–Encode a pillow image as a data URL.
-
fetch_audio–Args:
-
fetch_image–Args:
-
fetch_video–Args:
-
get_mm_features_in_window–Return (lo, hi) indices for features overlapping [start, end).
-
group_and_batch_mm_items–Group consecutive items (possibly from different requests) into batches.
-
group_and_batch_mm_kwargs–Group consecutive items (possibly from different requests) into batches.
argsort_mm_positions(mm_positions) ¶
Given a MultiModalPlaceholders, output a sequence of keys to sort the dictionary by offset (starting index in the input sequence) in ascending order.
Returns:
-
list[tuple[str, int]]–A list of
(modality, idx), which can be used to access an item -
list[tuple[str, int]]–by
mm_positions[modality][idx].
Source code in vllm/multimodal/utils.py
encode_audio_base64(audio, sampling_rate, *, format='WAV') ¶
Encode audio as base64.
encode_audio_url(audio, sampling_rate, *, format='WAV') ¶
Encode audio as a data URL.
Source code in vllm/multimodal/utils.py
encode_image_base64(image, *, image_mode='RGB', format='PNG') ¶
Encode a pillow image to base64 format.
By default, the image is converted into RGB format before being encoded.
Source code in vllm/multimodal/utils.py
encode_image_url(image, *, image_mode='RGB', format='PNG') ¶
Encode a pillow image as a data URL.
By default, the image is converted into RGB format before being encoded.
Source code in vllm/multimodal/utils.py
fetch_audio(audio_url, audio_io_kwargs=None) ¶
Parameters:
-
(audio_url¶str) –URL of the audio file to fetch.
-
(audio_io_kwargs¶dict[str, Any] | None, default:None) –Additional kwargs passed to handle audio IO.
Warning
This method has direct access to local files and is only intended to be called by user code. Never call this from the online server!
Source code in vllm/multimodal/utils.py
fetch_image(image_url, image_io_kwargs=None) ¶
Parameters:
-
(image_url¶str) –URL of the image file to fetch.
-
(image_io_kwargs¶dict[str, Any] | None, default:None) –Additional kwargs passed to handle image IO.
Warning
This method has direct access to local files and is only intended to be called by user code. Never call this from the online server!
Source code in vllm/multimodal/utils.py
fetch_video(video_url, video_io_kwargs=None) ¶
Parameters:
-
(video_url¶str) –URL of the video file to fetch.
-
(video_io_kwargs¶dict[str, Any] | None, default:None) –Additional kwargs passed to handle video IO.
Warning
This method has direct access to local files and is only intended to be called by user code. Never call this from the online server!
Source code in vllm/multimodal/utils.py
get_mm_features_in_window(mm_features, start, end) ¶
Return (lo, hi) indices for features overlapping [start, end).
Assumes mm_features are sorted by offset and non-overlapping, so offset + length is also sorted.
Source code in vllm/multimodal/utils.py
group_and_batch_mm_items(items, *, device=None, pin_memory=False) ¶
Group consecutive items (possibly from different requests) into batches.
Items must be split across groups if any of the following occurs, as the batch would otherwise be invalid: - They have different fields (e.g. mixed image and embedding inputs). - They have different values in MultiModalSharedField.
Parameters:
-
(items¶Sequence[MultiModalKwargsItem]) –List of
MultiModalKwargsItem. -
(device¶Device, default:None) –The device to place the grouped tensors on.
-
(pin_memory¶bool, default:False) –Whether to pin memory for faster host-to-device transfer.
Yields:
-
Generator[tuple[int, BatchedTensorInputs]]–A tuple
(num_items, grouped_kwargs), where: -
Generator[tuple[int, BatchedTensorInputs]]–kwargsis a dictionary of keyword arguments to pass to the model;
-
Generator[tuple[int, BatchedTensorInputs]]–num_itemsis the corresponding number of items.
Source code in vllm/multimodal/utils.py
group_and_batch_mm_kwargs(mm_kwargs, *, device=None, pin_memory=False) ¶
Group consecutive items (possibly from different requests) into batches.
Items must be split across groups if any of the following occurs, as the batch would otherwise be invalid: - They have different fields (e.g. mixed image and embedding inputs). - They have different values in MultiModalSharedField.
To simplify the implementation of embed_multimodal, we add another restriction that the items in a batch must belong to the same modality.
Parameters:
-
(mm_kwargs¶list[tuple[str, MultiModalKwargsItem]]) –List of
(modality, item). -
(device¶Device, default:None) –The device to place the grouped tensors on.
-
(pin_memory¶bool, default:False) –Whether to pin memory for faster host-to-device transfer.
Yields:
-
str–A tuple
(modality, num_items, grouped_kwargs), where: -
int–modalityis the modality of the batch;
-
BatchedTensorInputs–kwargsis a dictionary of keyword arguments to pass to the model;
-
tuple[str, int, BatchedTensorInputs]–num_itemsis the corresponding number of items.