vllm.utils ¶
Modules:
-
argparse_utils–Argument parsing utilities for vLLM.
-
async_utils–Contains helpers related to asynchronous code.
-
cache– -
collection_utils–Contains helpers that are applied to collections.
-
counter– -
cpu_resource_utils– -
cpu_triton_utils–Contains replacement functions to fallback Triton usages in CPU backend
-
deep_gemm–Compatibility wrapper for DeepGEMM API changes.
-
flashinfer–Compatibility wrapper for FlashInfer API changes.
-
func_utils–Contains helpers that are applied to functions.
-
gc_utils– -
hashing– -
import_utils–Contains helpers related to importing modules.
-
jsontree–Helper functions to work with nested JSON structures.
-
math_utils–Math utility functions for vLLM.
-
mem_constants– -
mem_utils– -
mistral–Provides lazy import of the vllm.tokenizers.mistral module.
-
multi_stream_utils– -
nccl– -
network_utils– -
numa_utils–NUMA binding utilities for vLLM worker processes.
-
nvtx_pytorch_hooks– -
ompmultiprocessing–OMP Aware Multiprocessing manager for running multiprocessing.Process()
-
platform_utils– -
registry– -
system_utils– -
tensor_schema– -
torch_utils–
Functions:
-
length_from_prompt_token_ids_or_embeds–Calculate the request length (in number of tokens) give either
length_from_prompt_token_ids_or_embeds(prompt_token_ids, prompt_embeds) ¶
Calculate the request length (in number of tokens) give either prompt_token_ids or prompt_embeds.