vllm.entrypoints.serve.disagg.protocol ¶
Classes:
-
GenerateRequest– -
MultiModalFeatures–Lightweight multimodal metadata produced by the render step.
-
PlaceholderRangeInfo–Serializable placeholder location for a single multi-modal item.
GenerateRequest ¶
Bases: BaseModel
Methods:
-
is_sampling_param_provided–Whether the caller explicitly set
sampling_params.<name>.
Attributes:
-
features(MultiModalFeatures | None) –Multimodal hashes and placeholder positions (populated for MM inputs).
-
sampling_params(SamplingParams) –The sampling parameters for the model.
-
token_ids(list[int]) –The token ids to generate text from.
Source code in vllm/entrypoints/serve/disagg/protocol.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
features = None class-attribute instance-attribute ¶
Multimodal hashes and placeholder positions (populated for MM inputs).
sampling_params instance-attribute ¶
The sampling parameters for the model.
token_ids = Field(min_length=1) class-attribute instance-attribute ¶
The token ids to generate text from.
is_sampling_param_provided(name) ¶
Whether the caller explicitly set sampling_params.<name>.
For requests parsed from a JSON body, this reflects the raw input dict. For requests constructed with a pre-built SamplingParams instance, all fields are considered provided so server-side defaults do not clobber values already resolved upstream.
Source code in vllm/entrypoints/serve/disagg/protocol.py
MultiModalFeatures ¶
Bases: BaseModel
Lightweight multimodal metadata produced by the render step.
Carries hashes (for cache lookup / identification) and placeholder positions so the downstream /generate service knows where in the token sequence each multimodal item lives.
Attributes:
-
kwargs_data(dict[str, list[str | None]] | None) –Per-modality serialized tensor data.
-
mm_hashes(dict[str, list[str]]) –Per-modality item hashes, e.g.
{"image": ["abc", "def"]}. -
mm_placeholders(dict[str, list[PlaceholderRangeInfo]]) –Per-modality placeholder ranges in the token sequence.
Source code in vllm/entrypoints/serve/disagg/protocol.py
kwargs_data = None class-attribute instance-attribute ¶
Per-modality serialized tensor data.
Each value is a list parallel to mm_hashes[modality]. A str entry is a base64-encoded MultiModalKwargsItem; None means the item should be resolved from cache. The entire field is None for metadata-only (cache-hit) responses.
mm_hashes instance-attribute ¶
Per-modality item hashes, e.g. {"image": ["abc", "def"]}.
mm_placeholders instance-attribute ¶
Per-modality placeholder ranges in the token sequence.