vllm.model_executor.models.transformers.utils ¶
Transformers modeling backend utilities.
Functions:
-
can_enable_torch_compile–Callable to be passed to
@support_torch_compile'senable_ifargument. -
init_on_device_without_buffers–A context manager under which models are initialized with all
-
recursive_replace_linear–Recursively replace linear modules in the model as needed.
-
replace_conv_class–Replace a Transformers Conv2d/Conv3d with vLLM's Conv2d/Conv3d.
-
replace_linear_class–Replace nn.Linear with one of vLLM's tensor parallel linear classes.
-
replace_rms_norm_class–Replace a Transformers RMSNorm with vLLM's RMSNorm.
can_enable_torch_compile(vllm_config) ¶
Callable to be passed to @support_torch_compile's enable_if argument.
Defaults to True but is disabled in the following situations:
- The model uses dynamic rope scaling.
Source code in vllm/model_executor/models/transformers/utils.py
init_on_device_without_buffers(device) ¶
A context manager under which models are initialized with all parameters on the specified device. However buffers are not initialized on specified device.
Parameters:
-
(device¶`torch.device`) –Device to initialize all parameters on.
Source code in vllm/model_executor/models/transformers/utils.py
recursive_replace_linear(model, quant_config, prefix='') ¶
Recursively replace linear modules in the model as needed.
Source code in vllm/model_executor/models/transformers/utils.py
replace_conv_class(conv) ¶
Replace a Transformers Conv2d/Conv3d with vLLM's Conv2d/Conv3d.
Parameters:
-
(conv¶TorchConv) –nn.Conv2dornn.Conv3dto be replaced.
Returns: The new Conv2dLayer or Conv3dLayer. If the conv module is not supported, returns the original conv module.
Source code in vllm/model_executor/models/transformers/utils.py
replace_linear_class(linear, style='replicate', quant_config=None, *, prefix='') ¶
Replace nn.Linear with one of vLLM's tensor parallel linear classes.
Parameters:
-
(linear¶Linear) –nn.Linearto be replaced. -
(style¶Style, default:'replicate') –Tensor parallel style of the new linear, e.g. "colwise".
-
(quant_config¶QuantizationConfig | None, default:None) –Quantization config for the new linear.
Returns: The new linear.
Source code in vllm/model_executor/models/transformers/utils.py
replace_rms_norm_class(rms_norm, hidden_size) ¶
Replace a Transformers RMSNorm with vLLM's RMSNorm.
This method assumes: - Weight is stored as weight. - Epsilon is stored as eps or variance_epsilon. - with_scale indicates whether the layer has a weight (Gemma3n only). - var_hidden_size is only ever used for Intern vision encoder in vLLM and Transformers doesn't appear to have the same concept.