vllm.inputs ¶
Modules:
-
engine–Schema and utilities for inputs to the engine client (
LLMEngine/AsyncLLM). -
llm–Schema and utilities for input prompts to the LLM API.
-
preprocess–
Classes:
-
DataPrompt–Represents generic inputs that are converted to
-
EmbedsInput–Represents embeddings-based input to the engine.
-
EmbedsPrompt–Schema for a prompt provided via token embeddings.
-
EncoderDecoderInput–A rendered
EncoderDecoderPrompt -
ExplicitEncoderDecoderPrompt–Schema for a pair of encoder and decoder singleton prompts.
-
MultiModalDataBuiltins–Type annotations for modality types predefined by vLLM.
-
MultiModalEncDecInput–Represents multi-modal input to the engine for encoder-decoder models.
-
MultiModalInput–Represents multi-modal input to the engine.
-
TextPrompt–Schema for a text prompt.
-
TokensInput–Represents token-based input to the engine.
-
TokensPrompt–Schema for a tokenized prompt.
Functions:
-
embeds_input–Construct
EmbedsInput -
tokens_input–Construct
TokensInput
Attributes:
-
DecoderOnlyEngineInput(TypeAlias) –A rendered
DecoderOnlyPrompt -
EngineInput(TypeAlias) –A rendered
PromptType -
ModalityData(TypeAlias) –Either a single data item, or a list of data items. Can only be None if UUID
-
MultiModalDataDict(TypeAlias) –A dictionary containing an entry for each modality type to input.
-
MultiModalHashes(TypeAlias) –A dictionary containing per-item hashes for each modality.
-
MultiModalPlaceholders(TypeAlias) –A dictionary containing per-item placeholder ranges for each modality.
-
MultiModalUUIDDict(TypeAlias) –A dictionary containing user-provided UUIDs for items in each modality.
-
PromptType(TypeAlias) –Schema for any prompt, regardless of model type.
-
SingletonInput(TypeAlias) –A rendered
SingletonPrompt -
SingletonPrompt(TypeAlias) –Schema for a single prompt. This is as opposed to a data structure
DecoderOnlyEngineInput = TokensInput | EmbedsInput | MultiModalInput module-attribute ¶
A rendered DecoderOnlyPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
EngineInput = DecoderOnlyEngineInput | EncoderDecoderInput module-attribute ¶
A rendered PromptType which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
ModalityData = _T | list[_T | None] | None module-attribute ¶
Either a single data item, or a list of data items. Can only be None if UUID is provided.
The number of data items allowed per modality is restricted by --limit-mm-per-prompt.
MultiModalDataDict = Mapping[str, ModalityData[Any]] module-attribute ¶
A dictionary containing an entry for each modality type to input.
The built-in modalities are defined by MultiModalDataBuiltins.
MultiModalHashes = Mapping[str, list[str]] module-attribute ¶
A dictionary containing per-item hashes for each modality.
MultiModalPlaceholders = Mapping[str, Sequence['PlaceholderRange']] module-attribute ¶
A dictionary containing per-item placeholder ranges for each modality.
MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str] module-attribute ¶
A dictionary containing user-provided UUIDs for items in each modality. If a UUID for an item is not provided, its entry will be None and MultiModalHasher will compute a hash for the item.
The UUID will be used to identify the item for all caching purposes (input processing caching, embedding caching, prefix caching, etc).
PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt module-attribute ¶
Schema for any prompt, regardless of model type.
This is the input format accepted by most LLM APIs.
SingletonInput = DecoderOnlyEngineInput | MultiModalEncDecInput module-attribute ¶
A rendered SingletonPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt module-attribute ¶
Schema for a single prompt. This is as opposed to a data structure which encapsulates multiple prompts, such as ExplicitEncoderDecoderPrompt.
DataPrompt ¶
Bases: _PromptOptions
Represents generic inputs that are converted to PromptType by IO processor plugins.
Attributes:
-
data(Any) –The input data.
-
data_format(str) –The input data format.
Source code in vllm/inputs/llm.py
EmbedsInput ¶
Bases: _InputOptions
Represents embeddings-based input to the engine.
Attributes:
-
is_token_ids(NotRequired[list[bool]]) –Per-position mask for mixed-mode inputs.
Truemeans the position -
prompt(NotRequired[str]) –The prompt text corresponding to the token IDs, if available.
-
prompt_embeds(Tensor) –The embeddings of the prompt.
-
prompt_token_ids(NotRequired[list[int]]) –Token IDs of the rendered prompt. Only set for mixed-mode inputs
-
type(Literal['embeds']) –The type of input.
Source code in vllm/inputs/engine.py
is_token_ids instance-attribute ¶
Per-position mask for mixed-mode inputs. True means the position is a real token ID (use the model's embedding layer); False means the position uses a pre-computed embedding row from prompt_embeds. Length MUST equal len(prompt_token_ids). For pure-embeds inputs this field is absent.
prompt instance-attribute ¶
The prompt text corresponding to the token IDs, if available.
prompt_embeds instance-attribute ¶
The embeddings of the prompt.
prompt_token_ids instance-attribute ¶
Token IDs of the rendered prompt. Only set for mixed-mode inputs (chat completion with prompt_embeds content parts). When present, is_token_ids MUST also be present and have the same length. For pure-embeds inputs this field is absent.
type instance-attribute ¶
The type of input.
EmbedsPrompt ¶
Bases: _PromptOptions
Schema for a prompt provided via token embeddings.
Attributes:
-
prompt(NotRequired[str]) –The prompt text corresponding to the token embeddings, if available.
-
prompt_embeds(Tensor) –The embeddings of the prompt.
-
prompt_is_token_ids(NotRequired[list[bool]]) –Per-position mask,
Trueuses the real token ID,Falseuses -
prompt_token_ids(NotRequired[list[int]]) –Token IDs for mixed-mode inputs (chat completion with
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
The prompt text corresponding to the token embeddings, if available.
prompt_embeds instance-attribute ¶
The embeddings of the prompt.
prompt_is_token_ids instance-attribute ¶
Per-position mask, True uses the real token ID, False uses the corresponding entry from prompt_embeds. Must be the same length as prompt_token_ids when both are set.
prompt_token_ids instance-attribute ¶
Token IDs for mixed-mode inputs (chat completion with prompt_embeds content parts). The tokens at positions where prompt_is_token_ids is False are placeholder tokens that get replaced by entries from prompt_embeds in the forward pass.
EncoderDecoderInput ¶
Bases: TypedDict
A rendered EncoderDecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
Attributes:
-
arrival_time(NotRequired[float]) –The time when the input was received (before rendering).
-
decoder_prompt(DecoderEngineInput) –The inputs for the decoder portion.
-
encoder_prompt(EncoderInput) –The inputs for the encoder portion.
Source code in vllm/inputs/engine.py
ExplicitEncoderDecoderPrompt ¶
Bases: TypedDict
Schema for a pair of encoder and decoder singleton prompts.
Note
This schema is not valid for decoder-only models.
Attributes:
-
decoder_prompt(DecoderPrompt | None) –The prompt for the decoder part of the model.
-
encoder_prompt(EncoderPrompt) –The prompt for the encoder part of the model.
Source code in vllm/inputs/llm.py
MultiModalDataBuiltins ¶
Bases: TypedDict
Type annotations for modality types predefined by vLLM.
Attributes:
-
audio(ModalityData[AudioItem]) –The input audio(s).
-
image(ModalityData[ImageItem]) –The input image(s).
-
video(ModalityData[VideoItem]) –The input video(s).
-
vision_chunk(ModalityData[VisionChunk]) –The input visual atom(s) - unified modality for images and video chunks.
Source code in vllm/inputs/llm.py
MultiModalEncDecInput ¶
Bases: MultiModalInput
Represents multi-modal input to the engine for encoder-decoder models.
Note
Even text-only encoder-decoder models are currently implemented as multi-modal models for convenience. (Example: https://github.com/vllm-project/bart-plugin)
Attributes:
-
encoder_prompt(NotRequired[str]) –The prompt text corresponding to the encoder token IDs, if available.
-
encoder_prompt_token_ids(list[int]) –The processed token IDs of the encoder prompt.
Source code in vllm/inputs/engine.py
MultiModalInput ¶
Bases: _InputOptions
Represents multi-modal input to the engine.
Attributes:
-
mm_hashes(MultiModalHashes) –The hashes of the multi-modal data.
-
mm_kwargs(MultiModalKwargsOptionalItems) –Keyword arguments to be directly passed to the model after batching.
-
mm_placeholders(MultiModalPlaceholders) –For each modality, information about the placeholder tokens in
-
prompt(NotRequired[str]) –The prompt text corresponding to the token IDs, if available.
-
prompt_token_ids(list[int]) –The processed token IDs which includes placeholder tokens.
-
type(Literal['multimodal']) –The type of input.
Source code in vllm/inputs/engine.py
mm_hashes instance-attribute ¶
The hashes of the multi-modal data.
mm_kwargs instance-attribute ¶
Keyword arguments to be directly passed to the model after batching.
mm_placeholders instance-attribute ¶
For each modality, information about the placeholder tokens in prompt_token_ids.
prompt instance-attribute ¶
The prompt text corresponding to the token IDs, if available.
prompt_token_ids instance-attribute ¶
The processed token IDs which includes placeholder tokens.
type instance-attribute ¶
The type of input.
TextPrompt ¶
Bases: _PromptOptions
Schema for a text prompt.
Attributes:
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
The input text to be tokenized before passing to the model.
TokensInput ¶
Bases: _InputOptions
Represents token-based input to the engine.
Attributes:
-
prompt(NotRequired[str]) –The prompt text corresponding to the token IDs, if available.
-
prompt_token_ids(list[int]) –The token IDs of the prompt.
-
type(Literal['token']) –The type of input.
Source code in vllm/inputs/engine.py
TokensPrompt ¶
Bases: _PromptOptions
Schema for a tokenized prompt.
Attributes:
-
prompt(NotRequired[str]) –The prompt text corresponding to the token IDs, if available.
-
prompt_token_ids(list[int]) –A list of token IDs to pass to the model.
-
token_type_ids(NotRequired[list[int]]) –A list of token type IDs to pass to the cross encoder model.
Source code in vllm/inputs/llm.py
embeds_input(prompt_embeds, *, prompt=None, cache_salt=None, prompt_token_ids=None, is_token_ids=None) ¶
Construct EmbedsInput from optional values.
Source code in vllm/inputs/engine.py
tokens_input(prompt_token_ids, *, prompt=None, cache_salt=None) ¶
Construct TokensInput from optional values.