vllm.v1.attention.backends ¶
Modules:
-
cpu_attn– -
fa_utils– -
flash_attn–Attention layer with FlashAttention.
-
flash_attn_diffkv–Attention layer with FlashAttention.
-
flashinfer–Attention layer with FlashInfer.
-
flex_attention–Attention layer with FlexAttention.
-
gdn_attn–Backend for GatedDeltaNet attention.
-
mamba2_attn– -
mamba_attn– -
mla– -
registry–Attention backend registry
-
rocm_aiter_fa–Attention layer with AiterFlashAttention.
-
rocm_aiter_unified_attn–Attention layer with PagedAttention and Triton prefix prefill.
-
rocm_attn–Attention layer with PagedAttention and Triton prefix prefill.
-
triton_attn–High-Performance Triton-only Attention layer.
-
turboquant_attn–TurboQuant attention backend for vLLM.
-
utils–