vllm.model_executor.kernels.linear.nvfp4.flashinfer ¶
Classes:
-
FlashInferB12xNvFp4LinearKernel–NVFP4 GEMM via FlashInfer's b12x CuTe DSL warp-level MMA kernel (SM120+).
-
FlashInferCudnnNvFp4LinearKernel–NVFP4 GEMM via FlashInfer's cuDNN wrapper.
-
FlashInferCutlassNvFp4LinearKernel–NVFP4 GEMM via FlashInfer's CUTLASS wrapper.
-
FlashInferTrtllmNvFp4LinearKernel–NVFP4 GEMM via FlashInfer's TensorRT-LLM wrapper.
FlashInferB12xNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's b12x CuTe DSL warp-level MMA kernel (SM120+).
Source code in vllm/model_executor/kernels/linear/nvfp4/flashinfer.py
FlashInferCudnnNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's cuDNN wrapper.
Source code in vllm/model_executor/kernels/linear/nvfp4/flashinfer.py
FlashInferCutlassNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's CUTLASS wrapper.
Source code in vllm/model_executor/kernels/linear/nvfp4/flashinfer.py
FlashInferTrtllmNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's TensorRT-LLM wrapper.