vllm.v1.worker.gpu.mm.encoder_cache ¶
Classes:
EncoderCache ¶
Methods:
-
reset_encoder_cache–Clear the GPU-side encoder cache storing vision embeddings.
-
reset_mm_cache–Clear the multi-modal cache that was used during profiling,
Source code in vllm/v1/worker/gpu/mm/encoder_cache.py
reset_encoder_cache() ¶
Clear the GPU-side encoder cache storing vision embeddings.
This should be called when model weights are updated to ensure stale embeddings computed with old weights are not reused.
Source code in vllm/v1/worker/gpu/mm/encoder_cache.py
reset_mm_cache() ¶
Clear the multi-modal cache that was used during profiling, but no longer needed during inference.