vllm.inputs.llm ¶
Schema and utilities for input prompts to the LLM API.
Classes:
-
DataPrompt–Represents generic inputs that are converted to
-
EmbedsPrompt–Schema for a prompt provided via token embeddings.
-
ExplicitEncoderDecoderPrompt–Schema for a pair of encoder and decoder singleton prompts.
-
MultiModalDataBuiltins–Type annotations for modality types predefined by vLLM.
-
TextPrompt–Schema for a text prompt.
-
TokensPrompt–Schema for a tokenized prompt.
Attributes:
-
DecoderOnlyPrompt(TypeAlias) –Schema of a prompt for a decoder-only model:
-
DecoderPrompt(TypeAlias) –Schema of a prompt for the decoder part of an encoder-decoder model:
-
EncoderDecoderPrompt(TypeAlias) –Schema for a prompt for an encoder-decoder model.
-
EncoderPrompt(TypeAlias) –Schema of a prompt for the encoder part of a encoder-decoder model:
-
ModalityData(TypeAlias) –Either a single data item, or a list of data items. Can only be None if UUID
-
MultiModalDataDict(TypeAlias) –A dictionary containing an entry for each modality type to input.
-
MultiModalUUIDDict(TypeAlias) –A dictionary containing user-provided UUIDs for items in each modality.
-
PromptType(TypeAlias) –Schema for any prompt, regardless of model type.
-
SingletonPrompt(TypeAlias) –Schema for a single prompt. This is as opposed to a data structure
DecoderOnlyPrompt = str | TextPrompt | list[int] | TokensPrompt | EmbedsPrompt module-attribute ¶
Schema of a prompt for a decoder-only model:
- A text prompt (string or
TextPrompt) - A tokenized prompt (list of token IDs, or
TokensPrompt) - An embeddings prompt (
EmbedsPrompt)
For encoder-decoder models, passing a singleton prompt is shorthand for passing ExplicitEncoderDecoderPrompt(encoder_prompt=prompt, decoder_prompt=None).
DecoderPrompt = str | TextPrompt | list[int] | TokensPrompt module-attribute ¶
Schema of a prompt for the decoder part of an encoder-decoder model:
- A text prompt (string or
TextPrompt) - A tokenized prompt (list of token IDs, or
TokensPrompt)
Note
Multi-modal inputs are not supported for decoder prompts.
EncoderDecoderPrompt = EncoderPrompt | ExplicitEncoderDecoderPrompt module-attribute ¶
Schema for a prompt for an encoder-decoder model.
You can pass a singleton encoder prompt, in which case the decoder prompt is considered to be None (i.e., infer automatically).
EncoderPrompt = str | TextPrompt | list[int] | TokensPrompt module-attribute ¶
Schema of a prompt for the encoder part of a encoder-decoder model:
- A text prompt (string or
TextPrompt) - A tokenized prompt (list of token IDs, or
TokensPrompt)
ModalityData = _T | list[_T | None] | None module-attribute ¶
Either a single data item, or a list of data items. Can only be None if UUID is provided.
The number of data items allowed per modality is restricted by --limit-mm-per-prompt.
MultiModalDataDict = Mapping[str, ModalityData[Any]] module-attribute ¶
A dictionary containing an entry for each modality type to input.
The built-in modalities are defined by MultiModalDataBuiltins.
MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str] module-attribute ¶
A dictionary containing user-provided UUIDs for items in each modality. If a UUID for an item is not provided, its entry will be None and MultiModalHasher will compute a hash for the item.
The UUID will be used to identify the item for all caching purposes (input processing caching, embedding caching, prefix caching, etc).
PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt module-attribute ¶
Schema for any prompt, regardless of model type.
This is the input format accepted by most LLM APIs.
SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt module-attribute ¶
Schema for a single prompt. This is as opposed to a data structure which encapsulates multiple prompts, such as ExplicitEncoderDecoderPrompt.
DataPrompt ¶
Bases: _PromptOptions
Represents generic inputs that are converted to PromptType by IO processor plugins.
Attributes:
-
data(Any) –The input data.
-
data_format(str) –The input data format.
Source code in vllm/inputs/llm.py
EmbedsPrompt ¶
Bases: _PromptOptions
Schema for a prompt provided via token embeddings.
Attributes:
-
prompt(NotRequired[str]) –The prompt text corresponding to the token embeddings, if available.
-
prompt_embeds(Tensor) –The embeddings of the prompt.
-
prompt_is_token_ids(NotRequired[list[bool]]) –Per-position mask,
Trueuses the real token ID,Falseuses -
prompt_token_ids(NotRequired[list[int]]) –Token IDs for mixed-mode inputs (chat completion with
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
The prompt text corresponding to the token embeddings, if available.
prompt_embeds instance-attribute ¶
The embeddings of the prompt.
prompt_is_token_ids instance-attribute ¶
Per-position mask, True uses the real token ID, False uses the corresponding entry from prompt_embeds. Must be the same length as prompt_token_ids when both are set.
prompt_token_ids instance-attribute ¶
Token IDs for mixed-mode inputs (chat completion with prompt_embeds content parts). The tokens at positions where prompt_is_token_ids is False are placeholder tokens that get replaced by entries from prompt_embeds in the forward pass.
ExplicitEncoderDecoderPrompt ¶
Bases: TypedDict
Schema for a pair of encoder and decoder singleton prompts.
Note
This schema is not valid for decoder-only models.
Attributes:
-
decoder_prompt(DecoderPrompt | None) –The prompt for the decoder part of the model.
-
encoder_prompt(EncoderPrompt) –The prompt for the encoder part of the model.
Source code in vllm/inputs/llm.py
MultiModalDataBuiltins ¶
Bases: TypedDict
Type annotations for modality types predefined by vLLM.
Attributes:
-
audio(ModalityData[AudioItem]) –The input audio(s).
-
image(ModalityData[ImageItem]) –The input image(s).
-
video(ModalityData[VideoItem]) –The input video(s).
-
vision_chunk(ModalityData[VisionChunk]) –The input visual atom(s) - unified modality for images and video chunks.
Source code in vllm/inputs/llm.py
TextPrompt ¶
Bases: _PromptOptions
Schema for a text prompt.
Attributes:
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
The input text to be tokenized before passing to the model.
TokensPrompt ¶
Bases: _PromptOptions
Schema for a tokenized prompt.
Attributes:
-
prompt(NotRequired[str]) –The prompt text corresponding to the token IDs, if available.
-
prompt_token_ids(list[int]) –A list of token IDs to pass to the model.
-
token_type_ids(NotRequired[list[int]]) –A list of token type IDs to pass to the cross encoder model.
Source code in vllm/inputs/llm.py
_PromptOptions ¶
Bases: TypedDict
Additional options available to all SingletonPrompt types.
Attributes:
-
cache_salt(NotRequired[str]) –Optional cache salt to be used for prefix caching.
-
mm_processor_kwargs(NotRequired[dict[str, Any] | None]) –Optional multi-modal processor kwargs to be forwarded to the
-
multi_modal_data(NotRequired[MultiModalDataDict | None]) –Optional multi-modal data to pass to the model,
-
multi_modal_uuids(NotRequired[MultiModalUUIDDict]) –Optional user-specified UUIDs for multimodal items, mapped by modality.
Source code in vllm/inputs/llm.py
cache_salt instance-attribute ¶
Optional cache salt to be used for prefix caching.
mm_processor_kwargs instance-attribute ¶
Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.
multi_modal_data instance-attribute ¶
Optional multi-modal data to pass to the model, if the model supports it.
multi_modal_uuids instance-attribute ¶
Optional user-specified UUIDs for multimodal items, mapped by modality. Lists must match the number of items per modality and may contain None. For None entries, the hasher will compute IDs automatically; non-None entries override the default hashes for caching, and MUST be unique per multimodal item.