vllm.model_executor.layers.quantization.utils.flashinfer_fp4_moe ¶
Utility helpers for NVFP4 + FlashInfer fused-MoE path
is_flashinfer_fp4_cutlass_moe_available ¶
is_flashinfer_fp4_cutlass_moe_available() -> bool
Return True when FlashInfer CUTLASS NV-FP4 kernels can be used.
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
reorder_w1w3_to_w3w1 ¶
Re-order the concatenated [w1, w3] tensors to [w3, w1]