vllm.v1.attention.selector ¶
Functions:
-
get_attn_backend–Selects which attention backend to use and lazily imports it.
-
get_mamba_attn_backend–Select which mamba attention backend to use and lazily import it.
get_attn_backend(head_size, dtype, kv_cache_dtype, use_mla=False, has_sink=False, use_sparse=False, use_mm_prefix=False, use_per_head_quant_scales=False, attn_type=None, num_heads=None) ¶
Selects which attention backend to use and lazily imports it.
Source code in vllm/v1/attention/selector.py
get_mamba_attn_backend(mamba_type) ¶
Select which mamba attention backend to use and lazily import it.