`vllm.config.mamba` ¶

Classes:

MambaBackendEnum –

Enumeration of supported Mamba SSU (selective state update) backends.
MambaConfig –

Configuration for Mamba SSM backends.

`MambaBackendEnum` ¶

Bases: Enum

Enumeration of supported Mamba SSU (selective state update) backends.

Source code in vllm/config/mamba.py

class MambaBackendEnum(Enum, metaclass=_MambaBackendEnumMeta):
    """Enumeration of supported Mamba SSU (selective state update) backends."""

    TRITON = "triton"
    FLASHINFER = "flashinfer"

`MambaConfig` ¶

Configuration for Mamba SSM backends.

Methods:

validate_backend_before –

Enable parsing of the backend enum type from string.

Attributes:

backend (MambaBackendEnum) –

Mamba SSU backend to use.
enable_stochastic_rounding (bool) –

Enable stochastic rounding when writing SSM state to fp16 cache.
stochastic_rounding_philox_rounds (int) –

Number of Philox PRNG rounds for stochastic rounding random number

Source code in vllm/config/mamba.py

@config
class MambaConfig:
    """Configuration for Mamba SSM backends."""

    backend: MambaBackendEnum = MambaBackendEnum.TRITON
    """Mamba SSU backend to use."""

    enable_stochastic_rounding: bool = False
    """Enable stochastic rounding when writing SSM state to fp16 cache.
    Uses random bits to unbias the rounding error, which can improve
    numerical stability for long sequences."""
    stochastic_rounding_philox_rounds: int = 0
    """Number of Philox PRNG rounds for stochastic rounding random number
    generation. 0 uses the Triton default. Higher values improve randomness
    quality at the cost of compute."""

    @field_validator("backend", mode="before")
    @classmethod
    def validate_backend_before(cls, value: Any) -> Any:
        """Enable parsing of the `backend` enum type from string."""
        if isinstance(value, str):
            return MambaBackendEnum[value.upper()]
        return value

    def __post_init__(self):
        if self.enable_stochastic_rounding:
            from vllm.platforms import current_platform

            if not current_platform.is_cuda():
                raise ValueError(
                    "Stochastic rounding for Mamba cache is only supported "
                    "on NVIDIA CUDA platforms. Please do not specify  "
                    "`--enable-mamba-cache-stochastic-rounding`."
                )
            if (
                self.backend == MambaBackendEnum.TRITON
                and not current_platform.is_device_capability_family(100)
            ):
                raise ValueError(
                    "Stochastic rounding for Mamba cache with triton backend requires "
                    "compute capability 10.0 (data center Blackwell). The `cvt.rs` "
                    "PTX instruction is not supported on your GPU. Please do not "
                    "specify `--enable-mamba-cache-stochastic-rounding`, "
                    "or set `--mamba-backend flashinfer`."
                )

`backend = MambaBackendEnum.TRITON` `class-attribute` `instance-attribute` ¶

Mamba SSU backend to use.

`enable_stochastic_rounding = False` `class-attribute` `instance-attribute` ¶

Enable stochastic rounding when writing SSM state to fp16 cache. Uses random bits to unbias the rounding error, which can improve numerical stability for long sequences.

`stochastic_rounding_philox_rounds = 0` `class-attribute` `instance-attribute` ¶

Number of Philox PRNG rounds for stochastic rounding random number generation. 0 uses the Triton default. Higher values improve randomness quality at the cost of compute.

`validate_backend_before(value)` `classmethod` ¶

Enable parsing of the backend enum type from string.

Source code in vllm/config/mamba.py

@field_validator("backend", mode="before")
@classmethod
def validate_backend_before(cls, value: Any) -> Any:
    """Enable parsing of the `backend` enum type from string."""
    if isinstance(value, str):
        return MambaBackendEnum[value.upper()]
    return value

`_MambaBackendEnumMeta` ¶

Bases: EnumMeta

Metaclass for MambaBackendEnum to provide better error messages.

Source code in vllm/config/mamba.py

class _MambaBackendEnumMeta(EnumMeta):
    """Metaclass for MambaBackendEnum to provide better error messages."""

    def __getitem__(cls, name: str):
        try:
            return super().__getitem__(name)
        except KeyError:
            valid = ", ".join(cls.__members__.keys())
            raise ValueError(
                f"Unknown Mamba SSU backend: '{name}'. Valid options are: {valid}"
            ) from None

vllm.config.mamba ¶

MambaBackendEnum ¶

MambaConfig ¶

backend = MambaBackendEnum.TRITON class-attribute instance-attribute ¶

enable_stochastic_rounding = False class-attribute instance-attribute ¶

stochastic_rounding_philox_rounds = 0 class-attribute instance-attribute ¶

validate_backend_before(value) classmethod ¶

_MambaBackendEnumMeta ¶

`vllm.config.mamba` ¶

`MambaBackendEnum` ¶

`MambaConfig` ¶

`backend = MambaBackendEnum.TRITON` `class-attribute` `instance-attribute` ¶

`enable_stochastic_rounding = False` `class-attribute` `instance-attribute` ¶

`stochastic_rounding_philox_rounds = 0` `class-attribute` `instance-attribute` ¶

`validate_backend_before(value)` `classmethod` ¶

`_MambaBackendEnumMeta` ¶