vllm.v1.attention.ops ¶
Modules:
-
chunked_prefill_paged_decode– -
common– -
dcp_alltoall–DCP All-to-All communication backend for attention.
-
flashmla– -
merge_attn_states– -
rocm_aiter_mla_sparse– -
triton_attention_helpers–Shared
@triton.jithelpers used by the unified attention kernel -
triton_decode_attention–Memory-efficient attention for decoding.
-
triton_prefill_attention–Memory-efficient attention for prefill.
-
triton_reshape_and_cache_flash– -
triton_turboquant_decode–Triton fused TurboQuant decode attention.
-
triton_turboquant_store–Fused Triton kernels for TurboQuant KV store.
-
triton_unified_attention– -
vit_attn_wrappers–This file contains ops for ViT attention to be compatible with torch.compile
-
xpu_mla_sparse–