`vllm.inputs.llm` ¶

Schema and utilities for input prompts to the LLM API.

Classes:

DataPrompt –

Represents generic inputs that are converted to
EmbedsPrompt –

Schema for a prompt provided via token embeddings.
ExplicitEncoderDecoderPrompt –

Schema for a pair of encoder and decoder singleton prompts.
MultiModalDataBuiltins –

Type annotations for modality types predefined by vLLM.
TextPrompt –

Schema for a text prompt.
TokensPrompt –

Schema for a tokenized prompt.

Attributes:

DecoderOnlyPrompt (TypeAlias) –

Schema of a prompt for a decoder-only model:
DecoderPrompt (TypeAlias) –

Schema of a prompt for the decoder part of an encoder-decoder model:
EncoderDecoderPrompt (TypeAlias) –

Schema for a prompt for an encoder-decoder model.
EncoderPrompt (TypeAlias) –

Schema of a prompt for the encoder part of a encoder-decoder model:
ModalityData (TypeAlias) –

Either a single data item, or a list of data items. Can only be None if UUID
MultiModalDataDict (TypeAlias) –

A dictionary containing an entry for each modality type to input.
MultiModalUUIDDict (TypeAlias) –

A dictionary containing user-provided UUIDs for items in each modality.
PromptType (TypeAlias) –

Schema for any prompt, regardless of model type.
SingletonPrompt (TypeAlias) –

Schema for a single prompt. This is as opposed to a data structure

`DecoderOnlyPrompt = str | TextPrompt | list[int] | TokensPrompt | EmbedsPrompt` `module-attribute` ¶

Schema of a prompt for a decoder-only model:

A text prompt (string or TextPrompt)
A tokenized prompt (list of token IDs, or TokensPrompt)
An embeddings prompt (EmbedsPrompt)

For encoder-decoder models, passing a singleton prompt is shorthand for passing ExplicitEncoderDecoderPrompt(encoder_prompt=prompt, decoder_prompt=None).

`DecoderPrompt = str | TextPrompt | list[int] | TokensPrompt` `module-attribute` ¶

Schema of a prompt for the decoder part of an encoder-decoder model:

A text prompt (string or TextPrompt)
A tokenized prompt (list of token IDs, or TokensPrompt)

Note

Multi-modal inputs are not supported for decoder prompts.

`EncoderDecoderPrompt = EncoderPrompt | ExplicitEncoderDecoderPrompt` `module-attribute` ¶

Schema for a prompt for an encoder-decoder model.

You can pass a singleton encoder prompt, in which case the decoder prompt is considered to be None (i.e., infer automatically).

`EncoderPrompt = str | TextPrompt | list[int] | TokensPrompt` `module-attribute` ¶

Schema of a prompt for the encoder part of a encoder-decoder model:

A text prompt (string or TextPrompt)
A tokenized prompt (list of token IDs, or TokensPrompt)

`ModalityData = _T | list[_T | None] | None` `module-attribute` ¶

Either a single data item, or a list of data items. Can only be None if UUID is provided.

The number of data items allowed per modality is restricted by --limit-mm-per-prompt.

`MultiModalDataDict = Mapping[str, ModalityData[Any]]` `module-attribute` ¶

A dictionary containing an entry for each modality type to input.

The built-in modalities are defined by MultiModalDataBuiltins.

`MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str]` `module-attribute` ¶

A dictionary containing user-provided UUIDs for items in each modality. If a UUID for an item is not provided, its entry will be None and MultiModalHasher will compute a hash for the item.

The UUID will be used to identify the item for all caching purposes (input processing caching, embedding caching, prefix caching, etc).

`PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt` `module-attribute` ¶

Schema for any prompt, regardless of model type.

This is the input format accepted by most LLM APIs.

`SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt` `module-attribute` ¶

Schema for a single prompt. This is as opposed to a data structure which encapsulates multiple prompts, such as ExplicitEncoderDecoderPrompt.

`DataPrompt` ¶

Bases: _PromptOptions

Represents generic inputs that are converted to PromptType by IO processor plugins.

Attributes:

data (Any) –

The input data.
data_format (str) –

The input data format.

Source code in vllm/inputs/llm.py

class DataPrompt(_PromptOptions):
    """
    Represents generic inputs that are converted to
    [`PromptType`][vllm.inputs.llm.PromptType] by IO processor plugins.
    """

    data: Any
    """The input data."""

    data_format: str
    """The input data format."""

`data` `instance-attribute` ¶

The input data.

`data_format` `instance-attribute` ¶

The input data format.

`EmbedsPrompt` ¶

Bases: _PromptOptions

Schema for a prompt provided via token embeddings.

Attributes:

prompt (NotRequired[str]) –

The prompt text corresponding to the token embeddings, if available.
prompt_embeds (Tensor) –

The embeddings of the prompt.
prompt_is_token_ids (NotRequired[list[bool]]) –

Per-position mask, True uses the real token ID, False uses
prompt_token_ids (NotRequired[list[int]]) –

Token IDs for mixed-mode inputs (chat completion with

Source code in vllm/inputs/llm.py

class EmbedsPrompt(_PromptOptions):
    """Schema for a prompt provided via token embeddings."""

    prompt_embeds: "torch.Tensor"
    """The embeddings of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token embeddings, if available."""

    prompt_token_ids: NotRequired[list[int]]
    """Token IDs for mixed-mode inputs (chat completion with
    `prompt_embeds` content parts). The tokens at positions where 
    `prompt_is_token_ids` is `False` are placeholder tokens that 
    get replaced by entries from `prompt_embeds` in the forward pass."""

    prompt_is_token_ids: NotRequired[list[bool]]
    """Per-position mask, `True` uses the real token ID, `False` uses
    the corresponding entry from `prompt_embeds`. 
    Must be the same length as `prompt_token_ids` when both are set."""

`prompt` `instance-attribute` ¶

The prompt text corresponding to the token embeddings, if available.

`prompt_embeds` `instance-attribute` ¶

The embeddings of the prompt.

`prompt_is_token_ids` `instance-attribute` ¶

Per-position mask, True uses the real token ID, False uses the corresponding entry from prompt_embeds. Must be the same length as prompt_token_ids when both are set.

`prompt_token_ids` `instance-attribute` ¶

Token IDs for mixed-mode inputs (chat completion with prompt_embeds content parts). The tokens at positions where prompt_is_token_ids is False are placeholder tokens that get replaced by entries from prompt_embeds in the forward pass.

`ExplicitEncoderDecoderPrompt` ¶

Bases: TypedDict

Schema for a pair of encoder and decoder singleton prompts.

Note

This schema is not valid for decoder-only models.

Attributes:

decoder_prompt (DecoderPrompt | None) –

The prompt for the decoder part of the model.
encoder_prompt (EncoderPrompt) –

The prompt for the encoder part of the model.

Source code in vllm/inputs/llm.py

class ExplicitEncoderDecoderPrompt(TypedDict):
    """
    Schema for a pair of encoder and decoder singleton prompts.

    Note:
        This schema is not valid for decoder-only models.
    """

    encoder_prompt: EncoderPrompt
    """The prompt for the encoder part of the model."""

    decoder_prompt: DecoderPrompt | None
    """
    The prompt for the decoder part of the model.

    Passing `None` will cause the prompt to be inferred automatically.
    """

`decoder_prompt` `instance-attribute` ¶

The prompt for the decoder part of the model.

Passing None will cause the prompt to be inferred automatically.

`encoder_prompt` `instance-attribute` ¶

The prompt for the encoder part of the model.

`MultiModalDataBuiltins` ¶

Bases: TypedDict

Type annotations for modality types predefined by vLLM.

Attributes:

audio (ModalityData[AudioItem]) –

The input audio(s).
image (ModalityData[ImageItem]) –

The input image(s).
video (ModalityData[VideoItem]) –

The input video(s).
vision_chunk (ModalityData[VisionChunk]) –

The input visual atom(s) - unified modality for images and video chunks.

Source code in vllm/inputs/llm.py

@final
class MultiModalDataBuiltins(TypedDict, total=False):
    """Type annotations for modality types predefined by vLLM."""

    image: ModalityData["ImageItem"]
    """The input image(s)."""

    video: ModalityData["VideoItem"]
    """The input video(s)."""

    audio: ModalityData["AudioItem"]
    """The input audio(s)."""

    vision_chunk: ModalityData["VisionChunk"]
    """The input visual atom(s) - unified modality for images and video chunks."""

`audio` `instance-attribute` ¶

The input audio(s).

`image` `instance-attribute` ¶

The input image(s).

`video` `instance-attribute` ¶

The input video(s).

`vision_chunk` `instance-attribute` ¶

The input visual atom(s) - unified modality for images and video chunks.

`TextPrompt` ¶

Bases: _PromptOptions

Schema for a text prompt.

Attributes:

prompt (str) –

The input text to be tokenized before passing to the model.

Source code in vllm/inputs/llm.py

class TextPrompt(_PromptOptions):
    """Schema for a text prompt."""

    prompt: str
    """The input text to be tokenized before passing to the model."""

`prompt` `instance-attribute` ¶

The input text to be tokenized before passing to the model.

`TokensPrompt` ¶

Bases: _PromptOptions

Schema for a tokenized prompt.

Attributes:

prompt (NotRequired[str]) –

The prompt text corresponding to the token IDs, if available.
prompt_token_ids (list[int]) –

A list of token IDs to pass to the model.
token_type_ids (NotRequired[list[int]]) –

A list of token type IDs to pass to the cross encoder model.

Source code in vllm/inputs/llm.py

class TokensPrompt(_PromptOptions):
    """Schema for a tokenized prompt."""

    prompt_token_ids: list[int]
    """A list of token IDs to pass to the model."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

    token_type_ids: NotRequired[list[int]]
    """A list of token type IDs to pass to the cross encoder model."""

`prompt` `instance-attribute` ¶

The prompt text corresponding to the token IDs, if available.

`prompt_token_ids` `instance-attribute` ¶

A list of token IDs to pass to the model.

`token_type_ids` `instance-attribute` ¶

A list of token type IDs to pass to the cross encoder model.

`_PromptOptions` ¶

Bases: TypedDict

Additional options available to all SingletonPrompt types.

Attributes:

cache_salt (NotRequired[str]) –

Optional cache salt to be used for prefix caching.
mm_processor_kwargs (NotRequired[dict[str, Any] | None]) –

Optional multi-modal processor kwargs to be forwarded to the
multi_modal_data (NotRequired[MultiModalDataDict | None]) –

Optional multi-modal data to pass to the model,
multi_modal_uuids (NotRequired[MultiModalUUIDDict]) –

Optional user-specified UUIDs for multimodal items, mapped by modality.

Source code in vllm/inputs/llm.py

class _PromptOptions(TypedDict):
    """
    Additional options available to all
    [`SingletonPrompt`][vllm.inputs.llm.SingletonPrompt] types.
    """

    multi_modal_data: NotRequired[MultiModalDataDict | None]
    """
    Optional multi-modal data to pass to the model,
    if the model supports it.
    """

    mm_processor_kwargs: NotRequired[dict[str, Any] | None]
    """
    Optional multi-modal processor kwargs to be forwarded to the
    multimodal input mapper & processor. Note that if multiple modalities
    have registered mappers etc for the model being considered, we attempt
    to pass the mm_processor_kwargs to each of them.
    """

    multi_modal_uuids: NotRequired[MultiModalUUIDDict]
    """
    Optional user-specified UUIDs for multimodal items, mapped by modality.
    Lists must match the number of items per modality and may contain `None`.
    For `None` entries, the hasher will compute IDs automatically; non-None
    entries override the default hashes for caching, and MUST be unique per
    multimodal item.
    """

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

`cache_salt` `instance-attribute` ¶

Optional cache salt to be used for prefix caching.

`mm_processor_kwargs` `instance-attribute` ¶

Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.

`multi_modal_data` `instance-attribute` ¶

Optional multi-modal data to pass to the model, if the model supports it.

`multi_modal_uuids` `instance-attribute` ¶

Optional user-specified UUIDs for multimodal items, mapped by modality. Lists must match the number of items per modality and may contain None. For None entries, the hasher will compute IDs automatically; non-None entries override the default hashes for caching, and MUST be unique per multimodal item.

vllm.inputs.llm ¶

DecoderOnlyPrompt = str | TextPrompt | list[int] | TokensPrompt | EmbedsPrompt module-attribute ¶

DecoderPrompt = str | TextPrompt | list[int] | TokensPrompt module-attribute ¶

EncoderDecoderPrompt = EncoderPrompt | ExplicitEncoderDecoderPrompt module-attribute ¶

EncoderPrompt = str | TextPrompt | list[int] | TokensPrompt module-attribute ¶

ModalityData = _T | list[_T | None] | None module-attribute ¶

MultiModalDataDict = Mapping[str, ModalityData[Any]] module-attribute ¶

MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str] module-attribute ¶

PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt module-attribute ¶

SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt module-attribute ¶

DataPrompt ¶

data instance-attribute ¶

data_format instance-attribute ¶

EmbedsPrompt ¶

prompt instance-attribute ¶

prompt_embeds instance-attribute ¶

prompt_is_token_ids instance-attribute ¶

prompt_token_ids instance-attribute ¶

ExplicitEncoderDecoderPrompt ¶

decoder_prompt instance-attribute ¶

encoder_prompt instance-attribute ¶

MultiModalDataBuiltins ¶

audio instance-attribute ¶

image instance-attribute ¶

video instance-attribute ¶

vision_chunk instance-attribute ¶

TextPrompt ¶

prompt instance-attribute ¶

TokensPrompt ¶

prompt instance-attribute ¶

prompt_token_ids instance-attribute ¶

token_type_ids instance-attribute ¶

_PromptOptions ¶

cache_salt instance-attribute ¶

mm_processor_kwargs instance-attribute ¶

multi_modal_data instance-attribute ¶

multi_modal_uuids instance-attribute ¶

`vllm.inputs.llm` ¶

`DecoderOnlyPrompt = str | TextPrompt | list[int] | TokensPrompt | EmbedsPrompt` `module-attribute` ¶

`DecoderPrompt = str | TextPrompt | list[int] | TokensPrompt` `module-attribute` ¶

`EncoderDecoderPrompt = EncoderPrompt | ExplicitEncoderDecoderPrompt` `module-attribute` ¶

`EncoderPrompt = str | TextPrompt | list[int] | TokensPrompt` `module-attribute` ¶

`ModalityData = _T | list[_T | None] | None` `module-attribute` ¶

`MultiModalDataDict = Mapping[str, ModalityData[Any]]` `module-attribute` ¶

`MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str]` `module-attribute` ¶

`PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt` `module-attribute` ¶

`SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt` `module-attribute` ¶

`DataPrompt` ¶

`data` `instance-attribute` ¶

`data_format` `instance-attribute` ¶

`EmbedsPrompt` ¶

`prompt` `instance-attribute` ¶

`prompt_embeds` `instance-attribute` ¶

`prompt_is_token_ids` `instance-attribute` ¶

`prompt_token_ids` `instance-attribute` ¶

`ExplicitEncoderDecoderPrompt` ¶

`decoder_prompt` `instance-attribute` ¶

`encoder_prompt` `instance-attribute` ¶

`MultiModalDataBuiltins` ¶

`audio` `instance-attribute` ¶

`image` `instance-attribute` ¶

`video` `instance-attribute` ¶

`vision_chunk` `instance-attribute` ¶

`TextPrompt` ¶

`prompt` `instance-attribute` ¶

`TokensPrompt` ¶

`prompt` `instance-attribute` ¶

`prompt_token_ids` `instance-attribute` ¶

`token_type_ids` `instance-attribute` ¶

`_PromptOptions` ¶

`cache_salt` `instance-attribute` ¶

`mm_processor_kwargs` `instance-attribute` ¶

`multi_modal_data` `instance-attribute` ¶

`multi_modal_uuids` `instance-attribute` ¶