vllm.entrypoints.serve.dev.rlhf.api_router ¶
Functions:
-
get_world_size–Get the world size from the parallel config.
-
is_paused–Return the current pause status.
-
pause_generation–Pause generation requests to allow weight updates.
-
resume_generation–Resume generation after a pause.
get_world_size(raw_request, include_dp=Query(True)) async ¶
Get the world size from the parallel config.
Parameters:
-
(include_dp¶bool, default:Query(True)) –If True (default), returns the world size including data parallelism (TP * PP * DP). If False, returns the world size without data parallelism (TP * PP).
Source code in vllm/entrypoints/serve/dev/rlhf/api_router.py
is_paused(raw_request) async ¶
Return the current pause status.
Source code in vllm/entrypoints/serve/dev/rlhf/api_router.py
pause_generation(raw_request, mode='abort', wait_for_inflight_requests=Query(False), clear_cache=True) async ¶
Pause generation requests to allow weight updates.
Parameters:
-
(mode¶Annotated[PauseMode, Query()], default:'abort') –How to handle in-flight requests: -
"abort": Abort all in-flight requests immediately (default). -"wait": Wait for in-flight requests to complete. -"keep": Freeze requests in queue; they resume on /resume. -
(wait_for_inflight_requests¶bool, default:Query(False)) –DEPRECATED. Use
mode="wait"instead. -
(clear_cache¶Annotated[bool, Query()], default:True) –DEPRECATED. Whether to clear KV/prefix caches after draining. Ignored when mode="keep".
Source code in vllm/entrypoints/serve/dev/rlhf/api_router.py
resume_generation(raw_request) async ¶
Resume generation after a pause.