`vllm.v1.attention.ops` ¶

Modules:

chunked_prefill_paged_decode –
common –
dcp_alltoall –

DCP All-to-All communication backend for attention.
flashmla –
merge_attn_states –
rocm_aiter_mla_sparse –
triton_attention_helpers –

Shared @triton.jit helpers used by the unified attention kernel
triton_decode_attention –

Memory-efficient attention for decoding.
triton_prefill_attention –

Memory-efficient attention for prefill.
triton_reshape_and_cache_flash –
triton_turboquant_decode –

Triton fused TurboQuant decode attention.
triton_turboquant_store –

Fused Triton kernels for TurboQuant KV store.
triton_unified_attention –
vit_attn_wrappers –

This file contains ops for ViT attention to be compatible with torch.compile
xpu_mla_sparse –

vllm.v1.attention.ops ¶