vllm.compilation.passes.utility.split_coalescing ¶
Coalesce duplicate split_with_sizes nodes that operate on the same input tensor with the same split sizes.
On certain hardware/dtype combinations (e.g. B200 + FP8) the Inductor graph may contain multiple split_with_sizes calls on the same tensor that CSE fails to merge. This pass detects and replaces the duplicates so that downstream pattern-matching passes (e.g. QK-Norm+RoPE fusion) see a single split node with all users attached.
See also
- vLLM #33295 (original issue)
- PyTorch #174472 (upstream CSE gap)
Classes:
-
SplitCoalescingPass–Replace duplicate
split_with_sizesnodes with a single canonical
SplitCoalescingPass ¶
Bases: VllmInductorPass
Replace duplicate split_with_sizes nodes with a single canonical node when they share the same input tensor and split sizes.