vllm.distributed.eplb.policy.abstract ¶
Classes:
AbstractEplbPolicy ¶
Bases: ABC
Methods:
-
rebalance_experts–Entry point for expert-parallelism load balancer.
Source code in vllm/distributed/eplb/policy/abstract.py
rebalance_experts(weight, num_replicas, num_groups, num_nodes, num_ranks, old_global_expert_indices=None) abstractmethod classmethod ¶
Entry point for expert-parallelism load balancer.
Parameters:
-
(weight¶Tensor) –[layers, num_logical_experts], the load statistics for all logical experts
-
(num_replicas¶int) –number of physical experts, must be a multiple of
num_ranks -
(num_groups¶int) –number of expert groups
-
(num_nodes¶int) –number of server nodes
-
(num_ranks¶int) –number of ranks, must be a multiple of
num_nodes -
(old_global_expert_indices¶Tensor | None, default:None) –[layers, num_logical_experts], the old global expert indices. Used to avoid unnecessary weight copying for experts moving within one rank.
Returns: physical_to_logical_map: [layers, num_replicas], the expert index of each replica