vllm.model_executor.models.clip ¶
Classes:
-
CLIPAttention– -
CLIPEncoder–Transformer encoder consisting of
config.num_hidden_layersself -
CLIPImagePixelInputs–Dimensions:
CLIPAttention ¶
Bases: Module
Methods:
-
forward–Input shape: Batch x Time x Channel
Source code in vllm/model_executor/models/clip.py
forward(hidden_states) ¶
Input shape: Batch x Time x Channel
Source code in vllm/model_executor/models/clip.py
CLIPEncoder ¶
Bases: Module
Transformer encoder consisting of config.num_hidden_layers self attention layers. Each layer is a [CLIPEncoderLayer].
Parameters:
-
(config¶CLIPTextConfig | CLIPVisionConfig) –CLIPConfig
Source code in vllm/model_executor/models/clip.py
CLIPImagePixelInputs ¶
Bases: TensorSchema
Dimensions
- bn: Batch size * number of images
- c: Number of channels (3)
- h: Height of each image
- w: Width of each image