vllm.model_executor.layers.fla.ops.layernorm_guard ¶
Classes:
LayerNormGated ¶
Bases: Module
Methods:
-
__init__–If group_size is not None, we do GroupNorm with each group having group_size elements.
-
forward–If z is not None, we do norm(x) * silu(z) if norm_before_gate, else norm(x * silu(z))
Source code in vllm/model_executor/layers/fla/ops/layernorm_guard.py
__init__(hidden_size, eps=1e-05, group_size=None, norm_before_gate=True, device=None, dtype=None) ¶
If group_size is not None, we do GroupNorm with each group having group_size elements. group_size=None is equivalent to group_size=hidden_size (i.e. there's only 1 group).
Source code in vllm/model_executor/layers/fla/ops/layernorm_guard.py
forward(x, z=None) ¶
If z is not None, we do norm(x) * silu(z) if norm_before_gate, else norm(x * silu(z))
Source code in vllm/model_executor/layers/fla/ops/layernorm_guard.py
RMSNormGated ¶
Bases: Module
Methods:
-
__init__–If group_size is not None, we do GroupNorm with each group having group_size elements.
-
forward–If z is not None, we do norm(x) * silu(z) if norm_before_gate, else norm(x * silu(z))
Source code in vllm/model_executor/layers/fla/ops/layernorm_guard.py
__init__(hidden_size, eps=1e-05, group_size=None, norm_before_gate=False, device=None, dtype=None, activation='swish') ¶
If group_size is not None, we do GroupNorm with each group having group_size elements. group_size=None is equivalent to group_size=hidden_size (i.e. there's only 1 group).
Source code in vllm/model_executor/layers/fla/ops/layernorm_guard.py
forward(x, z=None) ¶
If z is not None, we do norm(x) * silu(z) if norm_before_gate, else norm(x * silu(z))
Source code in vllm/model_executor/layers/fla/ops/layernorm_guard.py
_layer_norm_fn_impl(x, weight, bias, z=None, eps=1e-06, group_size=None, norm_before_gate=True, is_rms_norm=False, activation='swish') ¶
Triton layer/RMS norm with optional gating.
If z is not None, computes norm(x) * silu(z) when norm_before_gate, else norm(x * silu(z)).
This calls the triton kernel directly. The original code wrapped this in a torch.autograd.Function (LayerNormFn) to save tensors for a backward pass, but vLLM is inference-only so there is no backward pass. The autograd wrapper also prevented torch.compile/dynamo from tracing through the function due to its @staticmethod forward.