`vllm.utils` ¶

Modules:

argparse_utils –

Argument parsing utilities for vLLM.
async_utils –

Contains helpers related to asynchronous code.
cache –
collection_utils –

Contains helpers that are applied to collections.
counter –
cpu_resource_utils –
cpu_triton_utils –

Contains replacement functions to fallback Triton usages in CPU backend
deep_gemm –

Compatibility wrapper for DeepGEMM API changes.
flashinfer –

Compatibility wrapper for FlashInfer API changes.
func_utils –

Contains helpers that are applied to functions.
gc_utils –
hashing –
import_utils –

Contains helpers related to importing modules.
jsontree –

Helper functions to work with nested JSON structures.
math_utils –

Math utility functions for vLLM.
mem_constants –
mem_utils –
mistral –

Provides lazy import of the vllm.tokenizers.mistral module.
multi_stream_utils –
nccl –
network_utils –
numa_utils –

NUMA binding utilities for vLLM worker processes.
nvtx_pytorch_hooks –
ompmultiprocessing –

OMP Aware Multiprocessing manager for running multiprocessing.Process()
platform_utils –
registry –
system_utils –
tensor_schema –
torch_utils –

Functions:

length_from_prompt_token_ids_or_embeds –

Calculate the request length (in number of tokens) give either

`length_from_prompt_token_ids_or_embeds(prompt_token_ids, prompt_embeds)` ¶

Calculate the request length (in number of tokens) give either prompt_token_ids or prompt_embeds.

Source code in vllm/utils/__init__.py

def length_from_prompt_token_ids_or_embeds(
    prompt_token_ids: list[int] | torch.Tensor | None,
    prompt_embeds: torch.Tensor | None,
) -> int:
    """Calculate the request length (in number of tokens) give either
    prompt_token_ids or prompt_embeds.
    """
    prompt_token_len = None if prompt_token_ids is None else len(prompt_token_ids)
    prompt_embeds_len = None if prompt_embeds is None else len(prompt_embeds)

    if prompt_token_len is None:
        if prompt_embeds_len is None:
            raise ValueError("Neither prompt_token_ids nor prompt_embeds were defined.")
        return prompt_embeds_len
    else:
        if prompt_embeds_len is not None and prompt_embeds_len != prompt_token_len:
            raise ValueError(
                "Prompt token ids and prompt embeds had different lengths"
                f" prompt_token_ids={prompt_token_len}"
                f" prompt_embeds={prompt_embeds_len}"
            )
        return prompt_token_len

vllm.utils ¶

length_from_prompt_token_ids_or_embeds(prompt_token_ids, prompt_embeds) ¶

`vllm.utils` ¶

`length_from_prompt_token_ids_or_embeds(prompt_token_ids, prompt_embeds)` ¶