vllm.model_executor.model_loader.utils ¶
Utilities for selecting and loading models.
Classes:
-
ParamMapping–A class to handle parameter mapping for model weight loading.
Functions:
-
configure_quant_config–Pass packed_modules_mapping by reference to quant_config so that
-
initialize_model–Initialize a model with the given configurations.
_MODEL_ARCH_BY_HASH = dict[int, tuple[type[nn.Module], str]]() module-attribute ¶
Caches the outputs of _get_model_architecture.
ParamMapping dataclass ¶
A class to handle parameter mapping for model weight loading. It creates a bidirectional mapping between packed parameters and their constituent parts.
Source code in vllm/model_executor/model_loader/utils.py
configure_quant_config(quant_config, model_class) ¶
Pass packed_modules_mapping by reference to quant_config so that quant_config can properly match fused modules
Note that model attributes are passed by reference to quant_config, enabling them to be updated by model_class.new (ex. chatglm, qwen)
Once the SupportsQuant mixin has been added to all models, this function can be removed
Source code in vllm/model_executor/model_loader/utils.py
initialize_model(vllm_config, *, prefix='', model_class=None, model_config=None) ¶
Initialize a model with the given configurations.