vllm.config.profiler ¶
ProfilerConfig ¶
Dataclass which contains profiler config for the engine.
Source code in vllm/config/profiler.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
active_iterations class-attribute instance-attribute ¶
active_iterations: int = Field(default=5, ge=1)
Number of active iterations for PyTorch profiler schedule. This is the number of iterations where profiling data is actually collected. Defaults to 5 active iterations.
delay_iterations class-attribute instance-attribute ¶
delay_iterations: int = Field(default=0, ge=0)
Number of engine iterations to skip before starting profiling. Defaults to 0, meaning profiling starts immediately after receiving /start_profile.
ignore_frontend class-attribute instance-attribute ¶
ignore_frontend: bool = False
If True, disables the front-end profiling of AsyncLLM when using the 'torch' profiler. This is needed to reduce overhead when using delay/limit options, since the front-end profiling does not track iterations and will capture the entire range.
max_iterations class-attribute instance-attribute ¶
max_iterations: int = Field(default=0, ge=0)
Maximum number of engine iterations to profile after starting profiling. Defaults to 0, meaning no limit.
profiler class-attribute instance-attribute ¶
Which profiler to use. Defaults to None. Options are:
-
'torch': Use PyTorch profiler.
-
'cuda': Use CUDA profiler.
torch_profiler_dir class-attribute instance-attribute ¶
torch_profiler_dir: str = ''
Directory to save torch profiler traces. Both AsyncLLM's CPU traces and worker's traces (CPU & GPU) will be saved under this directory. Note that it must be an absolute path.
torch_profiler_dump_cuda_time_total class-attribute instance-attribute ¶
torch_profiler_dump_cuda_time_total: bool = True
If True, dumps total CUDA time in torch profiler traces. Enabled by default.
torch_profiler_record_shapes class-attribute instance-attribute ¶
torch_profiler_record_shapes: bool = False
If True, records tensor shapes in the torch profiler. Disabled by default.
torch_profiler_use_gzip class-attribute instance-attribute ¶
torch_profiler_use_gzip: bool = True
If True, saves torch profiler traces in gzip format. Enabled by default
torch_profiler_with_flops class-attribute instance-attribute ¶
torch_profiler_with_flops: bool = False
If True, enables FLOPS counting in the torch profiler. Disabled by default.
torch_profiler_with_memory class-attribute instance-attribute ¶
torch_profiler_with_memory: bool = False
If True, enables memory profiling in the torch profiler. Disabled by default.
torch_profiler_with_stack class-attribute instance-attribute ¶
torch_profiler_with_stack: bool = False
If True, enables stack tracing in the torch profiler. Disabled by default to reduce overhead. Can be enabled via VLLM_TORCH_PROFILER_WITH_STACK=1 env var or --profiler-config.torch_profiler_with_stack=true CLI flag.
wait_iterations class-attribute instance-attribute ¶
wait_iterations: int = Field(default=0, ge=0)
Number of wait iterations for PyTorch profiler schedule. During wait, the profiler is completely off with zero overhead. This allows skipping initial iterations before warmup begins. Defaults to 0 (no wait period).
warmup_iterations class-attribute instance-attribute ¶
warmup_iterations: int = Field(default=0, ge=0)
Number of warmup iterations for PyTorch profiler schedule. During warmup, the profiler runs but data is discarded. This helps reduce noise from JIT compilation and other one-time costs in the profiled trace. Defaults to 0 (schedule-based profiling disabled, recording all iterations). Set to a positive value (e.g., 2) to enable schedule-based profiling.
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.
Source code in vllm/config/profiler.py
_is_uri_path ¶
Check if path is a URI (scheme://...), excluding Windows drive letters.
Supports custom URI schemes like gs://, s3://, hdfs://, etc. These paths should not be converted to absolute paths.