vllm.config.kernel ¶
KernelConfig ¶
Configuration for kernel selection and warmup behavior.
Source code in vllm/config/kernel.py
enable_flashinfer_autotune class-attribute instance-attribute ¶
enable_flashinfer_autotune: bool = Field(default=None)
If True, run FlashInfer autotuning during kernel warmup.
moe_backend class-attribute instance-attribute ¶
Backend for MoE expert computation kernels. Available options:
-
"auto": Automatically select the best backend based on model and hardware
-
"triton": Use Triton-based fused MoE kernels
-
"deep_gemm": Use DeepGEMM kernels (FP8 block-quantized only)
-
"cutlass": Use vLLM CUTLASS kernels
-
"flashinfer_trtllm": Use FlashInfer with TRTLLM-GEN kernels
-
"flashinfer_cutlass": Use FlashInfer with CUTLASS kernels
-
"flashinfer_cutedsl": Use FlashInfer with CuteDSL kernels (FP4 only)
-
"marlin": Use Marlin kernels (weight-only quantization)
-
"aiter": Use AMD AITer kernels (ROCm only)
_skip_none_validation classmethod ¶
Skip validation if the value is None when initialization is delayed.
Source code in vllm/config/kernel.py
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.