Skip to content

`vllm.model_executor.layers.quantization.utils.machete_utils` ¶

Functions:

query_machete_supported_group_sizes –

Queries the supported group sizes for Machete based on the activation type.

`query_machete_supported_group_sizes(act_type)` ¶

Queries the supported group sizes for Machete based on the activation type.

Parameters:

act_type ¶
(dtype) –

The activation data type (torch.float16, torch.bfloat16).

Returns:

list[int] –

A list of supported group sizes. The group size must
list[int] –

be divisible by TileShapeK = 128 * 8 // num_bits(act_type).
list[int] –

-1 indicates per-channel quantization.

Source code in vllm/model_executor/layers/quantization/utils/machete_utils.py

def query_machete_supported_group_sizes(act_type: torch.dtype) -> list[int]:
    """
    Queries the supported group sizes for Machete based on the activation type.

    Args:
        act_type: The activation data type (torch.float16, torch.bfloat16).

    Returns:
        A list of supported group sizes. The group size must
        be divisible by `TileShapeK = 128 * 8 // num_bits(act_type)`.
        -1 indicates per-channel quantization.
    """
    if act_type in [torch.float16, torch.bfloat16]:
        return [-1, 64, 128]
    else:
        return [-1, 128]