vllm.v1.attention.backends.mla.prefill ¶
Modules:
-
base–Abstract base class for MLA prefill backends.
-
flash_attn–FlashAttention backend for MLA prefill.
-
flashinfer–FlashInfer backend for MLA prefill.
-
registry–Registry for MLA prefill backends.
-
selector–Selector for MLA prefill backends.
-
tokenspeed_mla–TokenSpeed CuTe DSL backend for MLA prefill.
-
trtllm_ragged–TRT-LLM Ragged backend for MLA prefill.
Classes:
-
MLAPrefillBackend–Abstract base class for MLA prefill backends.
-
MLAPrefillBackendEnum–Enumeration of all supported MLA prefill backends.
Functions:
-
get_mla_prefill_backend–Select the MLA prefill backend based on configuration and device.
-
register_mla_prefill_backend–Register or override an MLA prefill backend implementation.
MLAPrefillBackend ¶
Bases: ABC
Abstract base class for MLA prefill backends.
Methods:
-
prepare_metadata–Prepare backend-specific metadata before the forward pass.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
prepare_metadata(prefill_metadata) ¶
Prepare backend-specific metadata before the forward pass.
Called by the metadata builder after constructing the prefill metadata.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
MLAPrefillBackendEnum ¶
Bases: Enum
Enumeration of all supported MLA prefill backends.
Methods:
-
clear_override–Clear any override for this backend, reverting to the default.
-
get_class–Get the backend class (respects overrides).
-
get_path–Get the class path for this backend (respects overrides).
-
is_overridden–Check if this backend has been overridden.
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
clear_override() ¶
get_class() ¶
Get the backend class (respects overrides).
Returns:
-
type[MLAPrefillBackend]–The backend class
Raises:
-
ImportError–If the backend class cannot be imported
-
ValueError–If CUSTOM is used without being registered
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
get_path() ¶
Get the class path for this backend (respects overrides).
Returns:
-
str–The fully qualified class path string
Raises:
-
ValueError–If Backend.CUSTOM is used without being registered
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
get_mla_prefill_backend(vllm_config) ¶
Select the MLA prefill backend based on configuration and device.
This function first checks for explicit user preferences via mla_prefill_backend in AttentionConfig, then falls back to automatic priority-based selection.
Parameters:
-
(vllm_config¶VllmConfig) –The vLLM configuration.
Returns:
-
type[MLAPrefillBackend]–The selected prefill backend class.
Source code in vllm/v1/attention/backends/mla/prefill/selector.py
register_mla_prefill_backend(backend, class_path=None) ¶
Register or override an MLA prefill backend implementation.
Parameters:
-
(backend¶MLAPrefillBackendEnum) –The MLAPrefillBackendEnum member to register.
-
(class_path¶str | None, default:None) –Optional class path. If not provided and used as decorator, will be auto-generated from the class.
Returns:
Examples:
Override an existing MLA prefill backend¶
@register_mla_prefill_backend(MLAPrefillBackendEnum.FLASH_ATTN) class MyCustomFlashAttn(MLAPrefillBackend): ...
Register a custom third-party MLA prefill backend¶
@register_mla_prefill_backend(MLAPrefillBackendEnum.CUSTOM) class MyCustomPrefillBackend(MLAPrefillBackend): ...
Direct registration¶
register_mla_prefill_backend( MLAPrefillBackendEnum.CUSTOM, "my.module.MyCustomPrefillBackend" )