vllm.ir ¶
Modules:
Functions:
-
enable_torch_wrap–Context manager to enable/disable torch custom op wrapping for vLLM IR ops.
-
register_op–Register a new vLLM IR op.
-
set_default_torch_wrap–Permanently set the torch wrap flag.
enable_torch_wrap(enable=True) ¶
Context manager to enable/disable torch custom op wrapping for vLLM IR ops. When torch wrapping is disabled, the torch custom op layer is skipped and IR ops dispatch directly to the implementation. Helpful for avoiding torch dispatch overhead in eager mode and avoiding the need for lowering for platforms not using Inductor.
Source code in vllm/ir/op.py
register_op(f=None, *, name=None, activations=None, allow_inplace=False) ¶
Register a new vLLM IR op.
Parameters:
-
(f¶Callable | None, default:None) –the native implementation of the op
-
(name¶str | None, default:None) –the name of the op, defaults to the function name
-
(activations¶list[str] | None, default:None) –list of activation params, defaults to params starting with 'x'
-
(allow_inplace¶bool, default:False) –add a maybe_inplace overload that allows inplace impls
Returns:
Example usage: ```python @vllm.ir.register_op def my_add(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: return x + y
@vllm.ir.register_op(name="custom_mul") def multiply(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: return x * y