vllm.config.observability ¶
Classes:
-
ObservabilityConfig–Configuration for observability - metrics and tracing.
ObservabilityConfig ¶
Configuration for observability - metrics and tracing.
Methods:
-
compute_hash–WARNING: Whenever a new field is added to this config,
Attributes:
-
collect_detailed_traces(list[DetailedTraceModules] | None) –It makes sense to set this only if
--otlp-traces-endpointis set. If -
collect_model_execute_time(bool) –Whether to collect model execute time for the request.
-
collect_model_forward_time(bool) –Whether to collect model forward time for the request.
-
cudagraph_metrics(bool) –Enable CUDA graph metrics (number of padded/unpadded tokens, runtime cudagraph
-
enable_layerwise_nvtx_tracing(bool) –Enable layerwise NVTX tracing. This traces the execution of each layer or
-
enable_logging_iteration_details(bool) –Enable detailed logging of iteration details.
-
enable_mfu_metrics(bool) –Enable Model FLOPs Utilization (MFU) metrics.
-
enable_mm_processor_stats(bool) –Enable collection of timing statistics for multimodal processor operations.
-
kv_cache_metrics(bool) –Enable KV cache residency metrics (lifetime, idle time, reuse gaps).
-
kv_cache_metrics_sample(float) –Sampling rate for KV cache metrics (0.0, 1.0]. Default 0.01 = 1% of blocks.
-
otlp_traces_endpoint(str | None) –Target URL to which OpenTelemetry traces will be sent.
-
show_hidden_metrics(bool) –Check if the hidden metrics should be shown.
-
show_hidden_metrics_for_version(str | None) –Enable deprecated Prometheus metrics that have been hidden since the
Source code in vllm/config/observability.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
collect_detailed_traces = None class-attribute instance-attribute ¶
It makes sense to set this only if --otlp-traces-endpoint is set. If set, it will collect detailed traces for the specified modules. This involves use of possibly costly and or blocking operations and hence might have a performance impact.
Note that collecting detailed timing information for each request can be expensive.
collect_model_execute_time cached property ¶
Whether to collect model execute time for the request.
collect_model_forward_time cached property ¶
Whether to collect model forward time for the request.
cudagraph_metrics = False class-attribute instance-attribute ¶
Enable CUDA graph metrics (number of padded/unpadded tokens, runtime cudagraph dispatch modes, and their observed frequencies at every logging interval).
enable_layerwise_nvtx_tracing = False class-attribute instance-attribute ¶
Enable layerwise NVTX tracing. This traces the execution of each layer or module in the model and attach information such as input/output shapes to nvtx range markers. Noted that this doesn't work with CUDA graphs enabled.
enable_logging_iteration_details = False class-attribute instance-attribute ¶
Enable detailed logging of iteration details. If set, vllm EngineCore will log iteration details This includes number of context/generation requests and tokens and the elapsed cpu time for the iteration.
enable_mfu_metrics = False class-attribute instance-attribute ¶
Enable Model FLOPs Utilization (MFU) metrics.
enable_mm_processor_stats = False class-attribute instance-attribute ¶
Enable collection of timing statistics for multimodal processor operations. This is for internal use only (e.g., benchmarks) and is not exposed as a CLI argument.
kv_cache_metrics = False class-attribute instance-attribute ¶
Enable KV cache residency metrics (lifetime, idle time, reuse gaps). Uses sampling to minimize overhead. Requires log stats to be enabled (i.e., --disable-log-stats not set).
kv_cache_metrics_sample = Field(default=0.01, gt=0, le=1) class-attribute instance-attribute ¶
Sampling rate for KV cache metrics (0.0, 1.0]. Default 0.01 = 1% of blocks.
otlp_traces_endpoint = None class-attribute instance-attribute ¶
Target URL to which OpenTelemetry traces will be sent.
show_hidden_metrics cached property ¶
Check if the hidden metrics should be shown.
show_hidden_metrics_for_version = None class-attribute instance-attribute ¶
Enable deprecated Prometheus metrics that have been hidden since the specified version. For example, if a previously deprecated metric has been hidden since the v0.7.0 release, you use --show-hidden-metrics-for-version=0.7 as a temporary escape hatch while you migrate to new metrics. The metric is likely to be removed completely in an upcoming release.
_validate_collect_detailed_traces(value) classmethod ¶
Handle the legacy case where users might provide a comma-separated string instead of a list of strings.
Source code in vllm/config/observability.py
compute_hash() ¶
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.