vllm.model_executor.layers.mamba.ops.ssu_dispatch ¶
Dispatch module for Mamba selective state update (SSU) backends.
Provides a unified selective_state_update function that dispatches to either the Triton or FlashInfer backend based on the configured MambaBackendEnum. Follows SGLang's dispatch pattern adapted for vLLM.
Classes:
-
FlashInferSSUBackend–FlashInfer-based SSU backend.
-
MambaSSUBackend–Abstract base class for Mamba SSU backends.
-
TritonSSUBackend–Triton-based SSU backend (vLLM's default).
Functions:
-
get_mamba_ssu_backend–Get the current Mamba SSU backend. Raises if not initialized.
-
initialize_mamba_ssu_backend–Initialize the global Mamba SSU backend.
-
selective_state_update–Unified dispatch for Mamba selective state update.
FlashInferSSUBackend ¶
Bases: MambaSSUBackend
FlashInfer-based SSU backend.
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
MambaSSUBackend ¶
Bases: ABC
Abstract base class for Mamba SSU backends.
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
TritonSSUBackend ¶
Bases: MambaSSUBackend
Triton-based SSU backend (vLLM's default).
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
get_mamba_ssu_backend() ¶
Get the current Mamba SSU backend. Raises if not initialized.
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
initialize_mamba_ssu_backend(mamba_config, kv_cache_config) ¶
Initialize the global Mamba SSU backend.
No-op if kv_cache_config contains no specs that call selective_state_update.
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
selective_state_update(state, x, dt, A, B, C, D, dt_bias, z=None, dt_softplus=False, state_batch_indices=None, dst_state_batch_indices=None, null_block_id=NULL_BLOCK_ID, out=None, num_accepted_tokens=None, cu_seqlens=None, is_blackwell=False) ¶
Unified dispatch for Mamba selective state update.
Delegates to the initialized backend (Triton or FlashInfer).