Skip to content

`vllm.v1.attention.ops.chunked_prefill_paged_decode` ¶

Functions:

has_native_kv_cache_layout –

Return whether KV cache blocks can use the native ROCm pairing.

`has_native_kv_cache_layout(key_cache, value_cache)` ¶

Return whether KV cache blocks can use the native ROCm pairing.

The native reshape_and_cache writer assumes packed blocks. If cache update needs reshape_and_cache_flash for a stride-padded hybrid layout, decode should use the matching Triton path too.

Source code in vllm/v1/attention/ops/chunked_prefill_paged_decode.py

def has_native_kv_cache_layout(
    key_cache: torch.Tensor,
    value_cache: torch.Tensor,
) -> bool:
    """Return whether KV cache blocks can use the native ROCm pairing.

    The native reshape_and_cache writer assumes packed blocks. If cache update
    needs reshape_and_cache_flash for a stride-padded hybrid layout, decode
    should use the matching Triton path too.
    """
    return (
        key_cache.stride(0) == key_cache.shape[1:].numel()
        and value_cache.stride(0) == value_cache.shape[1:].numel()
    )