vllm.model_executor.layers.quantization.base_config ¶
Classes:
-
QuantizationConfig–Base class for quantization configs.
-
QuantizeMethodBase–Base class for different quantized methods.
Functions:
-
method_has_implemented_embedding–Not all quant methods have embedding implemented, so we need to check that
QuantizationConfig ¶
Bases: ABC
Base class for quantization configs.
Methods:
-
apply_vllm_mapper–Interface for models to update module names referenced in
-
from_config–Create a config class from the model's quantization config.
-
get_cache_scale_mapper–Mapping from checkpoint KV-cache scale names to vLLM scale names.
-
get_config_filenames–List of filenames to search for in the model directory.
-
get_from_keys–Get a value from the model's quantization config.
-
get_from_keys_or–Get an optional value from the model's quantization config.
-
get_min_capability–Minimum GPU capability to support the quantization method.
-
get_name–Name of the quantization method.
-
get_quant_method–Get the quantize method to use for the quantized layer.
-
get_supported_act_dtypes–List of supported activation dtypes.
-
is_mxfp4_quant–Determine if mxfp4 quantization will be used for this config.
-
maybe_update_config–Interface to update values after config initialization.
-
override_quantization_method–Detects if this quantization method can support a given checkpoint
Source code in vllm/model_executor/layers/quantization/base_config.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
apply_vllm_mapper(hf_to_vllm_mapper) ¶
Interface for models to update module names referenced in quantization configs in order to reflect the vllm model structure
Parameters:
-
(hf_to_vllm_mapper¶WeightsMapper) –maps from hf model structure (the assumed structure of the qconfig) to vllm model structure
Source code in vllm/model_executor/layers/quantization/base_config.py
from_config(config) abstractmethod classmethod ¶
Create a config class from the model's quantization config.
get_cache_scale_mapper() ¶
Mapping from checkpoint KV-cache scale names to vLLM scale names.
Returning a mapper here causes AutoWeightsLoader to apply it to the weight stream automatically; individual model load_weights methods do not need to know about KV-cache scales.
Source code in vllm/model_executor/layers/quantization/base_config.py
get_config_filenames() abstractmethod staticmethod ¶
List of filenames to search for in the model directory.
get_from_keys(config, keys) staticmethod ¶
Get a value from the model's quantization config.
Source code in vllm/model_executor/layers/quantization/base_config.py
get_from_keys_or(config, keys, default) staticmethod ¶
Get an optional value from the model's quantization config.
Source code in vllm/model_executor/layers/quantization/base_config.py
get_min_capability() abstractmethod classmethod ¶
Minimum GPU capability to support the quantization method.
E.g., 70 for Volta, 75 for Turing, 80 for Ampere. This requirement is due to the custom CUDA kernels used by the quantization method.
Source code in vllm/model_executor/layers/quantization/base_config.py
get_name() abstractmethod ¶
get_quant_method(layer, prefix) abstractmethod ¶
Get the quantize method to use for the quantized layer.
Parameters:
-
(layer¶Module) –The layer for the quant method.
-
(prefix¶str) –The full name of the layer in the state dict
Returns: The quantize method. None if the given layer doesn't support quant method.
Source code in vllm/model_executor/layers/quantization/base_config.py
get_supported_act_dtypes() abstractmethod ¶
is_mxfp4_quant(prefix, layer) ¶
Determine if mxfp4 quantization will be used for this config.
This allows hidden_size rounding to happen before moe_config creation without needing to instantiate quant_method first.
Parameters:
Returns:
-
bool–True if this config uses MXFP4 quantization, False otherwise
Source code in vllm/model_executor/layers/quantization/base_config.py
maybe_update_config(model_name, hf_config=None, revision=None) ¶
Interface to update values after config initialization.
Parameters:
-
(model_name¶str) –The name of the model
-
(hf_config¶PretrainedConfig | None, default:None) –The Hugging Face config of the model
-
(revision¶str | None, default:None) –The revision of the model
Returns:
Source code in vllm/model_executor/layers/quantization/base_config.py
override_quantization_method(hf_quant_cfg, user_quant, hf_config=None) classmethod ¶
Detects if this quantization method can support a given checkpoint format by overriding the user specified quantization method -- this method should only be overwritten by subclasses in exceptional circumstances.
Parameters:
-
(hf_quant_cfg¶dict[str, Any]) –The checkpoint's quantization config dict.
-
(user_quant¶str | None) –The user-specified quantization method string.
-
(hf_config¶Any, default:None) –The HuggingFace model config object (e.g. for model_type checks). May be None if not available.
Source code in vllm/model_executor/layers/quantization/base_config.py
QuantizeMethodBase ¶
Bases: ABC
Base class for different quantized methods.
Methods:
-
apply–Apply the weights in layer to the input tensor.
-
create_weights–Create weights for a layer.
-
embedding–Gather embeddings in the layer based on indices in the input tensor.
-
process_weights_after_loading–Process the weight after loading.
Source code in vllm/model_executor/layers/quantization/base_config.py
apply(layer, *args, **kwargs) abstractmethod ¶
Apply the weights in layer to the input tensor.
Expects create_weights to have been called before on the layer.
Source code in vllm/model_executor/layers/quantization/base_config.py
create_weights(layer, *weight_args, **extra_weight_attrs) abstractmethod ¶
Create weights for a layer.
The weights will be set as attributes of the layer.
Source code in vllm/model_executor/layers/quantization/base_config.py
embedding(layer, *args, **kwargs) ¶
Gather embeddings in the layer based on indices in the input tensor.
Expects create_weights to have been called before on the layer.
Source code in vllm/model_executor/layers/quantization/base_config.py
process_weights_after_loading(layer) ¶
Process the weight after loading.
This can be used for example, to transpose weights for computation.
method_has_implemented_embedding(method_class) ¶
Not all quant methods have embedding implemented, so we need to check that it exists for our given method. We check this by making sure the function has been changed from the base implementation.