vllm.config.quantization ¶
Classes:
-
QuantSpec–Quantization spec for one layer kind (linear or MoE).
-
QuantizationConfigArgs–User-facing quantization configuration.
Functions:
-
resolve_quantization_config–Resolve
--quantizationshorthand and--quantization-configinto a
QuantSpec ¶
Quantization spec for one layer kind (linear or MoE).
None on either side means the method class falls back to its own default (typically inherited from the checkpoint, or unquantized for online).
Attributes:
-
activation(QuantKeyField) –Activation quantization key, or a name from QUANT_KEY_NAMES.
-
weight(QuantKeyField) –Weight quantization key, or a name from QUANT_KEY_NAMES.
Source code in vllm/config/quantization.py
QuantizationConfigArgs ¶
User-facing quantization configuration.
See docs/features/quantization/online.md for the schema and shorthand string forms accepted on linear and moe.
Attributes:
-
ignore(list[str]) –Layers to skip quantization for.
-
linear(QuantSpec | None) –Spec applied to
LinearBaselayers. -
moe(QuantSpec | None) –Spec applied to
FusedMoElayers.
Source code in vllm/config/quantization.py
resolve_quantization_config(quantization, quantization_config) ¶
Resolve --quantization shorthand and --quantization-config into a QuantizationConfigArgs.
quantization is a CLI shorthand that desugars into a base config via _ONLINE_SHORTHANDS. quantization_config is a dict or pre-built args object. When both are given, fields explicitly set in quantization_config take precedence over the shorthand.