vllm.model_executor.layers.fla.ops.utils ¶
Functions:
-
input_guard–A decorator to make sure all input tensors are contiguous and set the device based on input tensors.
-
tensor_cache–A decorator that caches the most recent results of a function with tensor inputs.
input_guard(fn) ¶
A decorator to make sure all input tensors are contiguous and set the device based on input tensors.
Source code in vllm/model_executor/layers/fla/ops/utils.py
tensor_cache(fn) ¶
A decorator that caches the most recent results of a function with tensor inputs.
This decorator will store the output of the decorated function for the most recent set of input tensors. The cache is limited to a fixed size (default is 4). When the cache is full, the oldest entry will be removed.
Parameters:
-
(fn¶Callable[..., Tensor]) –The function to be decorated. It should take tensor inputs and return tensor outputs.
Returns:
-
Callable[..., Tensor]–Callable[..., torch.Tensor]: A wrapped version of the input function with single-entry caching.