vllm.v1.worker.encoder_cudagraph_defs ¶
Data transfer objects for encoder CUDA graph management.
Classes:
-
EncoderCudaGraphCaptureInputs–Everything needed for one CUDA graph capture.
-
EncoderCudaGraphConfig–Configuration for encoder CUDA graph management.
-
EncoderCudaGraphReplayBuffers–New buffer values for graph replay, computed by the model from
-
EncoderItemSpec–Description of a single encoder input item.
EncoderCudaGraphCaptureInputs dataclass ¶
Everything needed for one CUDA graph capture.
Returned by prepare_encoder_cudagraph_capture_inputs().
Attributes:
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
values instance-attribute ¶
Precomputed tensor buffers that will be recorded into the CUDA graph. The manager stores references to these exact tensor objects and copies new data into them before each graph.replay() call (buffer identity invariant).
EncoderCudaGraphConfig dataclass ¶
Configuration for encoder CUDA graph management.
Provided by the model at init time via get_encoder_cudagraph_config(). Values are fixed for the lifetime of the manager.
Attributes:
-
buffer_keys(list[str]) –Keys for the tensor buffers recorded into the CUDA graph.
-
max_frames_per_video(int) –Maximum number of frames per video.
-
modalities(list[str]) –Supported modalities (e.g. ["image"]).
-
out_hidden_size(int) –Output hidden dim of the vision encoder.
-
padding_logics(dict[str, EncoderCudaGraphPaddingLogic]) –Optional per-buffer replay padding/copy logic.
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
buffer_keys instance-attribute ¶
Keys for the tensor buffers recorded into the CUDA graph. Before replay the manager zeros then slice-copies new data into these buffers.
max_frames_per_video = 1 class-attribute instance-attribute ¶
Maximum number of frames per video. Only relevant when "video" is in modalities. Image-only models can use the default of 1.
modalities instance-attribute ¶
Supported modalities (e.g. ["image"]).
out_hidden_size instance-attribute ¶
Output hidden dim of the vision encoder. Used for DP gather buffer allocation.
padding_logics = field(default_factory=dict) class-attribute instance-attribute ¶
Optional per-buffer replay padding/copy logic. If absent for a key, the manager zeros the capture buffer and slice-copies the replay buffer into it.
EncoderCudaGraphReplayBuffers dataclass ¶
New buffer values for graph replay, computed by the model from actual batch inputs.
Returned by prepare_encoder_cudagraph_replay_buffers(). Keys match EncoderCudaGraphConfig.buffer_keys.
Attributes:
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
values instance-attribute ¶
Data to copy into the captured buffers before replay. None values leave the corresponding captured buffer unchanged.
EncoderItemSpec dataclass ¶
Description of a single encoder input item.
Returned by get_encoder_cudagraph_item_specs() to describe each image or video in a batch without the manager needing to understand model-specific input formats.
Attributes:
-
input_size(int) –Number of input patches/rows for this item.
-
output_tokens(int) –Number of output tokens after encoder processing (e.g. after