vllm.config.profiler ¶
Classes:
-
ProfilerConfig–Dataclass which contains profiler config for the engine.
ProfilerConfig ¶
Dataclass which contains profiler config for the engine.
Methods:
-
compute_hash–WARNING: Whenever a new field is added to this config,
Attributes:
-
active_iterations(int) –Number of active iterations for PyTorch profiler schedule.
-
delay_iterations(int) –Number of engine iterations to skip before starting profiling.
-
ignore_frontend(bool) –If
True, disables the front-end profiling of AsyncLLM when using the -
max_iterations(int) –Maximum number of engine iterations to profile after starting profiling.
-
profiler(ProfilerKind | None) –Which profiler to use. Defaults to None. Options are:
-
torch_profiler_dir(str) –Directory to save torch profiler traces. Both AsyncLLM's CPU traces and
-
torch_profiler_dump_cuda_time_total(bool) –If
True, dumps total CUDA time in torch profiler traces. Enabled by default. -
torch_profiler_record_shapes(bool) –If
True, records tensor shapes in the torch profiler. Disabled by default. -
torch_profiler_use_gzip(bool) –If
True, saves torch profiler traces in gzip format. Enabled by default -
torch_profiler_with_flops(bool) –If
True, enables FLOPS counting in the torch profiler. Disabled by default. -
torch_profiler_with_memory(bool) –If
True, enables memory profiling in the torch profiler. -
torch_profiler_with_stack(bool) –If
True, enables stack tracing in the torch profiler. Enabled by default -
wait_iterations(int) –Number of wait iterations for PyTorch profiler schedule.
-
warmup_iterations(int) –Number of warmup iterations for PyTorch profiler schedule.
Source code in vllm/config/profiler.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
active_iterations = Field(default=5, ge=1) class-attribute instance-attribute ¶
Number of active iterations for PyTorch profiler schedule. This is the number of iterations where profiling data is actually collected. Defaults to 5 active iterations.
delay_iterations = Field(default=0, ge=0) class-attribute instance-attribute ¶
Number of engine iterations to skip before starting profiling. Defaults to 0, meaning profiling starts immediately after receiving /start_profile.
ignore_frontend = False class-attribute instance-attribute ¶
If True, disables the front-end profiling of AsyncLLM when using the 'torch' profiler. This is needed to reduce overhead when using delay/limit options, since the front-end profiling does not track iterations and will capture the entire range.
max_iterations = Field(default=0, ge=0) class-attribute instance-attribute ¶
Maximum number of engine iterations to profile after starting profiling. Defaults to 0, meaning no limit.
profiler = None class-attribute instance-attribute ¶
Which profiler to use. Defaults to None. Options are:
- 'torch': Use PyTorch profiler.
- 'cuda': Use CUDA profiler.
torch_profiler_dir = '' class-attribute instance-attribute ¶
Directory to save torch profiler traces. Both AsyncLLM's CPU traces and worker's traces (CPU & GPU) will be saved under this directory. Note that it must be an absolute path.
torch_profiler_dump_cuda_time_total = True class-attribute instance-attribute ¶
If True, dumps total CUDA time in torch profiler traces. Enabled by default.
torch_profiler_record_shapes = False class-attribute instance-attribute ¶
If True, records tensor shapes in the torch profiler. Disabled by default.
torch_profiler_use_gzip = True class-attribute instance-attribute ¶
If True, saves torch profiler traces in gzip format. Enabled by default
torch_profiler_with_flops = False class-attribute instance-attribute ¶
If True, enables FLOPS counting in the torch profiler. Disabled by default.
torch_profiler_with_memory = False class-attribute instance-attribute ¶
If True, enables memory profiling in the torch profiler. Disabled by default.
torch_profiler_with_stack = True class-attribute instance-attribute ¶
If True, enables stack tracing in the torch profiler. Enabled by default as it is useful for debugging. Can be disabled via --profiler-config.torch_profiler_with_stack=false CLI flag.
wait_iterations = Field(default=0, ge=0) class-attribute instance-attribute ¶
Number of wait iterations for PyTorch profiler schedule. During wait, the profiler is completely off with zero overhead. This allows skipping initial iterations before warmup begins. Defaults to 0 (no wait period).
warmup_iterations = Field(default=0, ge=0) class-attribute instance-attribute ¶
Number of warmup iterations for PyTorch profiler schedule. During warmup, the profiler runs but data is discarded. This helps reduce noise from JIT compilation and other one-time costs in the profiled trace. Defaults to 0 (schedule-based profiling disabled, recording all iterations). Set to a positive value (e.g., 2) to enable schedule-based profiling.
compute_hash() ¶
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.
Source code in vllm/config/profiler.py
_is_uri_path(path) ¶
Check if path is a URI (scheme://...), excluding Windows drive letters.
Supports custom URI schemes like gs://, s3://, hdfs://, etc. These paths should not be converted to absolute paths.