vllm.model_executor.layers.fused_moe.moe_fused_mul_sum ¶
Functions:
-
moe_fused_mul_sum–Fused kernel for MoE (Mixture of Experts) to perform weighted summation
moe_fused_mul_sum(inputs, topk_weights, outputs=None, topk_ids=None, expert_map=None) ¶
Fused kernel for MoE (Mixture of Experts) to perform weighted summation of expert outputs.
Parameters:
-
(inputs¶Tensor) –The output from experts. Shape: (num_tokens, top_k, hidden_size).
-
(topk_weights¶Tensor) –The weights assigned to each expert for each token. Shape: (num_tokens, top_k).
-
(outputs¶Tensor | None, default:None) –Optional pre-allocated output tensor. Shape: (num_tokens, hidden_size).
-
(topk_ids¶Tensor | None, default:None) –Optional indices of the top-k experts. Used when
expert_mapis provided. Shape: (num_tokens, top_k). -
(expert_map¶Tensor | None, default:None) –Optional mapping for Expert Parallelism. A value < 0 indicates an invalid token/expert pair that will be skipped.
Returns: