vllm.model_executor.layers.quantization.turboquant.centroids ¶
Lloyd-Max optimal scalar quantizer for TurboQuant.
After rotating a d-dimensional unit vector by a random orthogonal matrix, each coordinate approximately follows N(0, 1/d) for d >= 64. We solve the Lloyd-Max conditions to find optimal centroids.
Based on: turboquant-pytorch/lloyd_max.py (Zandieh et al.)
Functions:
-
get_centroids–Get precomputed Lloyd-Max centroids (cached).
-
solve_lloyd_max–Solve Lloyd-Max optimal quantizer for N(0, 1/d) distribution.
_trapz(f, a, b, n=200) ¶
Trapezoidal numerical integration (replaces scipy.integrate.quad).
Source code in vllm/model_executor/layers/quantization/turboquant/centroids.py
get_centroids(d, bits) cached ¶
Get precomputed Lloyd-Max centroids (cached).
solve_lloyd_max(d, bits, max_iter=200, tol=1e-10) ¶
Solve Lloyd-Max optimal quantizer for N(0, 1/d) distribution.
Parameters:
-
(d¶int) –Vector dimension (determines variance = 1/d).
-
(bits¶int) –Number of quantization bits.
-
(max_iter¶int, default:200) –Maximum Lloyd-Max iterations.
-
(tol¶float, default:1e-10) –Convergence tolerance.
Returns:
-
centroids(Tensor) –Sorted tensor of 2^bits optimal centroids.
-
boundaries(Tensor) –Sorted tensor of 2^bits - 1 decision boundaries.