Bases: ABC
Base class for attention-like layers (Attention, Mamba, etc.) that support the v1 engine.
This provides a common interface for getting attention backends from different layer types.
Methods:
Source code in vllm/model_executor/layers/attention_layer_base.py
| class AttentionLayerBase(ABC):
"""
Base class for attention-like layers (Attention, Mamba, etc.)
that support the v1 engine.
This provides a common interface for getting attention backends
from different layer types.
"""
impl: "AttentionImpl"
@abstractmethod
def get_attn_backend(self) -> type[AttentionBackend]:
"""Get the attention backend class for this layer."""
pass
@abstractmethod
def get_kv_cache_spec(self, vllm_config: VllmConfig) -> KVCacheSpec | None:
"""
Get the KV cache spec for this layer.
May be None if the layer does not need KV cache.
"""
pass
|
get_attn_backend() abstractmethod
Get the attention backend class for this layer.
Source code in vllm/model_executor/layers/attention_layer_base.py
| @abstractmethod
def get_attn_backend(self) -> type[AttentionBackend]:
"""Get the attention backend class for this layer."""
pass
|
get_kv_cache_spec(vllm_config) abstractmethod
Get the KV cache spec for this layer. May be None if the layer does not need KV cache.
Source code in vllm/model_executor/layers/attention_layer_base.py
| @abstractmethod
def get_kv_cache_spec(self, vllm_config: VllmConfig) -> KVCacheSpec | None:
"""
Get the KV cache spec for this layer.
May be None if the layer does not need KV cache.
"""
pass
|