vllm.config.pooler ¶
Classes:
-
PoolerConfig–Controls the behavior of output pooling in pooling models.
PoolerConfig ¶
Controls the behavior of output pooling in pooling models.
Methods:
-
compute_hash–WARNING: Whenever a new field is added to this config,
Attributes:
-
dimensions(int | None) –Reduce the dimensions of embeddings if model
-
enable_chunked_processing(bool) –Whether to enable chunked processing for long inputs that exceed the model's
-
logit_mean(float | None) –If provided, subtract this value from classification logits before
-
logit_sigma(float | None) –If provided, divide the classification logits by this value after
-
max_embed_len(int | None) –Maximum input length allowed for embedding generation. When set, allows
-
pooling_type(SequencePoolingType | TokenPoolingType | None) –The pooling method used for pooling.
-
returned_token_ids(list[int] | None) –A list of indices for the vocabulary dimensions to be extracted,
-
seq_pooling_type(SequencePoolingType | None) –The pooling method used for sequence pooling.
-
step_tag_id(int | None) –If set, only the score corresponding to the
step_tag_idin the -
task(PoolingTask | None) –The task used for pooling.
-
tok_pooling_type(TokenPoolingType | None) –The pooling method used for tokenwise pooling.
-
use_activation(bool | None) –Whether to apply activation function to the pooler outputs.
Source code in vllm/config/pooler.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | |
dimensions = None class-attribute instance-attribute ¶
Reduce the dimensions of embeddings if model support matryoshka representation. Defaults to None.
enable_chunked_processing = False class-attribute instance-attribute ¶
Whether to enable chunked processing for long inputs that exceed the model's maximum position embeddings. When enabled, long inputs will be split into chunks, processed separately, and then aggregated using weighted averaging. This allows embedding models to handle arbitrarily long text without CUDA errors. Defaults to False.
logit_mean = None class-attribute instance-attribute ¶
If provided, subtract this value from classification logits before activation. Used for affine score calibration (Platt scaling): activation((logit - logit_mean) / logit_sigma). Defaults to None.
logit_sigma = None class-attribute instance-attribute ¶
If provided, divide the classification logits by this value after mean subtraction. Used for affine score calibration (Platt scaling): activation((logit - logit_mean) / logit_sigma). Defaults to None.
max_embed_len = None class-attribute instance-attribute ¶
Maximum input length allowed for embedding generation. When set, allows inputs longer than max_embed_len to be accepted for embedding models. When an input exceeds max_embed_len, it will be handled according to the original max_model_len validation logic. Defaults to None (i.e. set to max_model_len).
pooling_type = None class-attribute instance-attribute ¶
The pooling method used for pooling.
If set, seq_pooling_type or tok_pooling_type are automatically populated with this field. Alternatively, users can set seq_pooling_type and tok_pooling_type explicitly.
This field is mainly for user convenience. Internal code should always use seq_pooling_type or tok_pooling_type instead of pooling_type.
returned_token_ids = None class-attribute instance-attribute ¶
A list of indices for the vocabulary dimensions to be extracted, such as the token IDs of good_token and bad_token in the math-shepherd-mistral-7b-prm model.
seq_pooling_type = None class-attribute instance-attribute ¶
The pooling method used for sequence pooling.
step_tag_id = None class-attribute instance-attribute ¶
If set, only the score corresponding to the step_tag_id in the generated sentence should be returned. Otherwise, the scores for all tokens are returned.
task = None class-attribute instance-attribute ¶
The task used for pooling.
tok_pooling_type = None class-attribute instance-attribute ¶
The pooling method used for tokenwise pooling.
use_activation = None class-attribute instance-attribute ¶
Whether to apply activation function to the pooler outputs. None uses the pooler's default, which is True in most cases.
compute_hash() ¶
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.