vllm.config.ec_transfer ¶
Classes:
-
ECTransferConfig–Configuration for distributed EC cache transfer.
ECTransferConfig ¶
Configuration for distributed EC cache transfer.
Methods:
-
compute_hash–WARNING: Whenever a new field is added to this config,
Attributes:
-
ec_buffer_device(str | None) –The device used by ec connector to buffer the EC cache.
-
ec_buffer_size(float) –The buffer size for TorchDistributedConnector. Measured in number of
-
ec_connector(str | None) –The EC connector for vLLM to transmit EC caches between vLLM instances.
-
ec_connector_extra_config(dict[str, Any]) –any extra config that the connector may need.
-
ec_connector_module_path(str | None) –The Python module path to dynamically load the EC connector from.
-
ec_ip(str) –The EC connector ip, used to build distributed connection.
-
ec_parallel_size(int) –The number of parallel instances for EC cache transfer. For
-
ec_port(int) –The EC connector port, used to build distributed connection.
-
ec_rank(int | None) –The rank of this vLLM instance in the EC cache transfer. Typical value:
-
ec_role(ECRole | None) –Whether this vLLM instance produces, consumes EC cache, or both. Choices
-
engine_id(str | None) –The engine id for EC transfers.
Source code in vllm/config/ec_transfer.py
ec_buffer_device = 'cuda' class-attribute instance-attribute ¶
The device used by ec connector to buffer the EC cache. Currently only support 'cuda'.
ec_buffer_size = 1000000000.0 class-attribute instance-attribute ¶
The buffer size for TorchDistributedConnector. Measured in number of bytes. Recommended value: 1e9 (about 1GB).
ec_connector = None class-attribute instance-attribute ¶
The EC connector for vLLM to transmit EC caches between vLLM instances.
ec_connector_extra_config = field(default_factory=dict) class-attribute instance-attribute ¶
any extra config that the connector may need.
ec_connector_module_path = None class-attribute instance-attribute ¶
The Python module path to dynamically load the EC connector from. Only supported in V1.
ec_ip = '127.0.0.1' class-attribute instance-attribute ¶
The EC connector ip, used to build distributed connection.
ec_parallel_size = 1 class-attribute instance-attribute ¶
The number of parallel instances for EC cache transfer. For PyNcclConnector, this should be 2.
ec_port = 14579 class-attribute instance-attribute ¶
The EC connector port, used to build distributed connection.
ec_rank = None class-attribute instance-attribute ¶
The rank of this vLLM instance in the EC cache transfer. Typical value: 0 for encoder, 1 for pd instance. Currently only 1P1D is supported.
ec_role = None class-attribute instance-attribute ¶
Whether this vLLM instance produces, consumes EC cache, or both. Choices are 'ec_producer', 'ec_consumer', 'ec_both'.
engine_id = None class-attribute instance-attribute ¶
The engine id for EC transfers.
compute_hash() ¶
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.