vllm.model_executor.models.mellum ¶
Classes:
-
MellumAttention–Differences from
Qwen3MoeAttention: -
MellumDecoderLayer–Differences from
Qwen3MoeDecoderLayer: -
MellumForCausalLM–Differences from
Qwen3MoeForCausalLM: -
MellumModel–Differences from
Qwen3MoeModel:
MellumAttention ¶
Bases: Qwen3MoeAttention
Differences from Qwen3MoeAttention: - Supports per_layer_sliding_window for Attention.
Source code in vllm/model_executor/models/mellum.py
MellumDecoderLayer ¶
Bases: Qwen3MoeDecoderLayer
Differences from Qwen3MoeDecoderLayer: - Supports interleaved SWA and per-layer RoPE scaling.
Source code in vllm/model_executor/models/mellum.py
MellumForCausalLM ¶
Bases: Qwen3MoeForCausalLM
Differences from Qwen3MoeForCausalLM: - Uses MellumModel.
Source code in vllm/model_executor/models/mellum.py
MellumModel ¶
Bases: Qwen3MoeModel
Differences from Qwen3MoeModel: - Uses MellumDecoderLayer.