`vllm.model_executor.models.interfaces_base` ¶

Classes:

VllmModel –

The interface required for all models in vLLM.
VllmModelForPooling –

The interface required for all pooling models in vLLM.
VllmModelForTextGeneration –

The interface required for all generative models in vLLM.

Functions:

attn_type –

Decorator to set VllmModelForPooling.attn_type.
default_pooling_type –

Decorator to set VllmModelForPooling.default_*_pooling_type.

`VllmModel` ¶

Bases: Protocol[T_co]

The interface required for all models in vLLM.

Methods:

embed_input_ids –

Apply token embeddings to input_ids.

Source code in vllm/model_executor/models/interfaces_base.py

@runtime_checkable
class VllmModel(Protocol[T_co]):
    """The interface required for all models in vLLM."""

    def __init__(self, vllm_config: VllmConfig, prefix: str = "") -> None: ...

    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
        """Apply token embeddings to `input_ids`."""
        ...

    def forward(self, input_ids: torch.Tensor, positions: torch.Tensor) -> T_co: ...

`embed_input_ids(input_ids)` ¶

Apply token embeddings to input_ids.

Source code in vllm/model_executor/models/interfaces_base.py

def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
    """Apply token embeddings to `input_ids`."""
    ...

`VllmModelForPooling` ¶

Bases: VllmModel[T_co], Protocol[T_co]

The interface required for all pooling models in vLLM.

Attributes:

attn_type (AttnTypeStr) –

Indicates the
default_seq_pooling_type (SequencePoolingType) –

Indicates the vllm.config.pooler.PoolerConfig.seq_pooling_type
default_tok_pooling_type (TokenPoolingType) –

Indicates the vllm.config.pooler.PoolerConfig.tok_pooling_type
is_pooling_model (Literal[True]) –

A flag that indicates this model supports pooling.
pooler (Pooler) –

The pooler is only called on TP rank 0.
score_type (ScoreType) –

Indicates the

Source code in vllm/model_executor/models/interfaces_base.py

@runtime_checkable
class VllmModelForPooling(VllmModel[T_co], Protocol[T_co]):
    """The interface required for all pooling models in vLLM."""

    is_pooling_model: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports pooling.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    default_seq_pooling_type: ClassVar[SequencePoolingType] = "LAST"
    """
    Indicates the [vllm.config.pooler.PoolerConfig.seq_pooling_type][]
    to use by default.

    You can use the
    [vllm.model_executor.models.interfaces_base.default_pooling_type][]
    decorator to conveniently set this field.
    """

    default_tok_pooling_type: ClassVar[TokenPoolingType] = "ALL"
    """
    Indicates the [vllm.config.pooler.PoolerConfig.tok_pooling_type][]
    to use by default.

    You can use the
    [vllm.model_executor.models.interfaces_base.default_pooling_type][]
    decorator to conveniently set this field.
    """

    attn_type: ClassVar[AttnTypeStr] = "decoder"
    """
    Indicates the
    [vllm.config.model.ModelConfig.attn_type][]
    to use by default.

    You can use the
    [vllm.model_executor.models.interfaces_base.attn_type][]
    decorator to conveniently set this field.
    """

    score_type: ClassVar[ScoreType] = "bi-encoder"
    """
    Indicates the
    [vllm.config.model.ModelConfig.score_type][]
    to use by default.

    Scoring API handles score/rerank for:\n
    - "classify" task (score_type: cross-encoder models)\n
    - "embed" task (score_type: bi-encoder models)\n
    - "token_embed" task (score_type: late interaction models)\n

    score_type defaults to bi-encoder, then the Score API uses the "embed" task.\n
    If you set score_type to cross-encoder via 
    [vllm.model_executor.models.interfaces.SupportsCrossEncoding][], 
    then the Score API uses the "score" task.\n
    If you set score_type to late-interaction via 
    [vllm.model_executor.models.interfaces.SupportsLateInteraction][], 
    then the Score API uses the "token_embed" task.\n
    """

    pooler: Pooler
    """The pooler is only called on TP rank 0."""

`attn_type = 'decoder'` `class-attribute` ¶

Indicates the vllm.config.model.ModelConfig.attn_type to use by default.

You can use the vllm.model_executor.models.interfaces_base.attn_type decorator to conveniently set this field.

`default_seq_pooling_type = 'LAST'` `class-attribute` ¶

Indicates the vllm.config.pooler.PoolerConfig.seq_pooling_type to use by default.

You can use the vllm.model_executor.models.interfaces_base.default_pooling_type decorator to conveniently set this field.

`default_tok_pooling_type = 'ALL'` `class-attribute` ¶

Indicates the vllm.config.pooler.PoolerConfig.tok_pooling_type to use by default.

You can use the vllm.model_executor.models.interfaces_base.default_pooling_type decorator to conveniently set this field.

`is_pooling_model = True` `class-attribute` ¶

A flag that indicates this model supports pooling.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

`pooler` `instance-attribute` ¶

The pooler is only called on TP rank 0.

`score_type = 'bi-encoder'` `class-attribute` ¶

Indicates the vllm.config.model.ModelConfig.score_type to use by default.

Scoring API handles score/rerank for:

"classify" task (score_type: cross-encoder models)
"embed" task (score_type: bi-encoder models)
"token_embed" task (score_type: late interaction models)

score_type defaults to bi-encoder, then the Score API uses the "embed" task.

If you set score_type to cross-encoder via vllm.model_executor.models.interfaces.SupportsCrossEncoding, then the Score API uses the "score" task.

If you set score_type to late-interaction via vllm.model_executor.models.interfaces.SupportsLateInteraction, then the Score API uses the "token_embed" task.

`VllmModelForTextGeneration` ¶

Bases: VllmModel[T], Protocol[T]

The interface required for all generative models in vLLM.

Methods:

compute_logits –

Return None if TP rank > 0.

Source code in vllm/model_executor/models/interfaces_base.py

@runtime_checkable
class VllmModelForTextGeneration(VllmModel[T], Protocol[T]):
    """The interface required for all generative models in vLLM."""

    def compute_logits(
        self,
        hidden_states: T,
    ) -> T | None:
        """Return `None` if TP rank > 0."""
        ...

`compute_logits(hidden_states)` ¶

Return None if TP rank > 0.

Source code in vllm/model_executor/models/interfaces_base.py

def compute_logits(
    self,
    hidden_states: T,
) -> T | None:
    """Return `None` if TP rank > 0."""
    ...

`attn_type(attn_type)` ¶

Decorator to set VllmModelForPooling.attn_type.

Source code in vllm/model_executor/models/interfaces_base.py

def attn_type(attn_type: AttnTypeStr):
    """Decorator to set `VllmModelForPooling.attn_type`."""

    def func(model: _T) -> _T:
        model.attn_type = attn_type  # type: ignore
        return model

    return func

`default_pooling_type(*, seq_pooling_type='LAST', tok_pooling_type='ALL')` ¶

Decorator to set VllmModelForPooling.default_*_pooling_type.

Source code in vllm/model_executor/models/interfaces_base.py

def default_pooling_type(
    *,
    seq_pooling_type: SequencePoolingType = "LAST",
    tok_pooling_type: TokenPoolingType = "ALL",
):
    """Decorator to set `VllmModelForPooling.default_*_pooling_type`."""

    def func(model: _T) -> _T:
        model.default_seq_pooling_type = seq_pooling_type  # type: ignore
        model.default_tok_pooling_type = tok_pooling_type  # type: ignore
        return model

    return func

vllm.model_executor.models.interfaces_base ¶

VllmModel ¶

embed_input_ids(input_ids) ¶

VllmModelForPooling ¶

attn_type = 'decoder' class-attribute ¶

default_seq_pooling_type = 'LAST' class-attribute ¶

default_tok_pooling_type = 'ALL' class-attribute ¶

is_pooling_model = True class-attribute ¶

pooler instance-attribute ¶

score_type = 'bi-encoder' class-attribute ¶

VllmModelForTextGeneration ¶

compute_logits(hidden_states) ¶

attn_type(attn_type) ¶

default_pooling_type(*, seq_pooling_type='LAST', tok_pooling_type='ALL') ¶

`vllm.model_executor.models.interfaces_base` ¶

`VllmModel` ¶

`embed_input_ids(input_ids)` ¶

`VllmModelForPooling` ¶

`attn_type = 'decoder'` `class-attribute` ¶

`default_seq_pooling_type = 'LAST'` `class-attribute` ¶

`default_tok_pooling_type = 'ALL'` `class-attribute` ¶

`is_pooling_model = True` `class-attribute` ¶

`pooler` `instance-attribute` ¶

`score_type = 'bi-encoder'` `class-attribute` ¶

`VllmModelForTextGeneration` ¶

`compute_logits(hidden_states)` ¶

`attn_type(attn_type)` ¶

`default_pooling_type(*, seq_pooling_type='LAST', tok_pooling_type='ALL')` ¶