Skip to content

`vllm.entrypoints.openai.parser.harmony_utils` ¶

Functions:

auto_drop_analysis_messages –

Harmony models expect the analysis messages (representing raw chain of thought) to
build_harmony_preamble –

Build the standard Harmony system/developer prefix for a request.
extract_instructions_from_messages –

Peel a leading system/developer Chat Completion or Responses message and
flatten_input_text_content –

Extract text parts from a Chat Completion or Responses API content field and
has_custom_tools –

Checks if the given tool types are custom tools
is_function_recipient –

Check whether recipient refers to a function tool call.
parse_chat_input_to_harmony_message –

Parse a message from request.messages in the Chat Completion API to
parse_chat_inputs_to_harmony_messages –

Parse a list of messages from request.messages in the Chat Completion API to
parse_chat_output –

Parse the output of a Harmony chat completion into reasoning and final content.

`auto_drop_analysis_messages(msgs)` ¶

Harmony models expect the analysis messages (representing raw chain of thought) to be dropped after an assistant message to the final channel is produced from the reasoning of those messages.

The openai-harmony library does this if the very last assistant message is to the final channel, but it does not handle the case where we're in longer multi-turn conversations and the client gave us reasoning content from previous turns of the conversation with multiple assistant messages to the final channel in the conversation.

So, we find the index of the last assistant message to the final channel and drop all analysis messages that precede it, leaving only the analysis messages that are relevant to the current part of the conversation.

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def auto_drop_analysis_messages(msgs: list[Message]) -> list[Message]:
    """
    Harmony models expect the analysis messages (representing raw chain of thought) to
    be dropped after an assistant message to the final channel is produced from the
    reasoning of those messages.

    The openai-harmony library does this if the very last assistant message is to the
    final channel, but it does not handle the case where we're in longer multi-turn
    conversations and the client gave us reasoning content from previous turns of
    the conversation with multiple assistant messages to the final channel in the
    conversation.

    So, we find the index of the last assistant message to the final channel and drop
    all analysis messages that precede it, leaving only the analysis messages that
    are relevant to the current part of the conversation.
    """
    last_assistant_final_index = -1
    for i in range(len(msgs) - 1, -1, -1):
        msg = msgs[i]
        if msg.author.role == "assistant" and msg.channel == "final":
            last_assistant_final_index = i
            break

    cleaned_msgs: list[Message] = []
    for i, msg in enumerate(msgs):
        if i < last_assistant_final_index and msg.channel == "analysis":
            continue
        cleaned_msgs.append(msg)

    return cleaned_msgs

`build_harmony_preamble(*, instructions=None, tools=None, reasoning_effort=None, browser_description=None, python_description=None, container_description=None, with_custom_tools=False)` ¶

Build the standard Harmony system/developer prefix for a request.

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def build_harmony_preamble(
    *,
    instructions: str | None = None,
    tools: list[Tool | ChatCompletionToolsParam] | None = None,
    reasoning_effort: str | None = None,
    browser_description: str | None = None,
    python_description: str | None = None,
    container_description: str | None = None,
    with_custom_tools: bool = False,
) -> list[Message]:
    """
    Build the standard Harmony system/developer prefix for a request.
    """
    developer_instructions = system_instructions = None
    if envs.VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS:
        system_instructions = instructions
    else:
        developer_instructions = instructions

    messages = [
        get_system_message(
            reasoning_effort=reasoning_effort,
            browser_description=browser_description,
            python_description=python_description,
            container_description=container_description,
            instructions=system_instructions,
            with_custom_tools=with_custom_tools,
        )
    ]
    if developer_instructions or tools:
        messages.append(
            get_developer_message(
                instructions=developer_instructions,
                tools=tools,
            )
        )
    return messages

`extract_instructions_from_messages(messages)` ¶

Peel a leading system/developer Chat Completion or Responses message and flatten its instruction text.

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def extract_instructions_from_messages(
    messages: Sequence[Any],
) -> tuple[str | None, list[Any]]:
    """
    Peel a leading system/developer Chat Completion or Responses message and
    flatten its instruction text.
    """
    remaining_messages = list(messages)
    if not remaining_messages:
        return None, remaining_messages

    first_message = remaining_messages[0]
    if not isinstance(first_message, dict):
        if hasattr(first_message, "to_dict"):
            # Handle OpenAI Harmony Message
            first_message = first_message.to_dict()
        elif hasattr(first_message, "model_dump"):
            first_message = first_message.model_dump(exclude_none=True)
        else:
            raise ValueError(f"Unknown message type: {type(first_message)}")

    if first_message.get("role") not in (
        "system",
        "developer",
    ):
        return None, remaining_messages

    instructions = flatten_input_text_content(first_message.get("content"))
    return instructions, remaining_messages[1:]

`flatten_input_text_content(content)` ¶

Extract text parts from a Chat Completion or Responses API content field and flatten them into a single string. Returns None if no text content is found.

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def flatten_input_text_content(content: Any) -> str | None:
    """
    Extract text parts from a Chat Completion or Responses API content field and
    flatten them into a single string. Returns None if no text content is found.
    """
    if content is None or isinstance(content, str):
        return content
    if not isinstance(content, list):
        return None

    texts: list[str] = []
    for item in content:
        if isinstance(item, str):
            texts.append(item)
            continue
        if isinstance(item, dict):
            text = item.get("text")
            if text is not None:
                texts.append(text)
    return "".join(texts) if texts else None

`has_custom_tools(tool_types)` ¶

Checks if the given tool types are custom tools (i.e. any tool other than MCP builtin tools)

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def has_custom_tools(tool_types: set[str]) -> bool:
    """
    Checks if the given tool types are custom tools
    (i.e. any tool other than MCP builtin tools)
    """
    return not tool_types.issubset(MCP_BUILTIN_TOOLS)

`is_function_recipient(recipient, allowed_function_tool_names=None)` ¶

Check whether recipient refers to a function tool call.

The optional allowed_function_tool_names parameter is used by the Responses API to distinguish bare function-call recipients (missing the functions. prefix) from MCP tool calls. When provided, a bare recipient is only treated as a function call if it appears in the set. The Chat Completions path omits this parameter so that all bare recipients are accepted as function calls (the heuristic fallback).

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def is_function_recipient(
    recipient: str,
    allowed_function_tool_names: frozenset[str] | None = None,
) -> bool:
    """Check whether *recipient* refers to a function tool call.

    The optional *allowed_function_tool_names* parameter is used by the
    Responses API to distinguish bare function-call recipients (missing the
    ``functions.`` prefix) from MCP tool calls.  When provided, a bare
    recipient is only treated as a function call if it appears in the set.
    The Chat Completions path omits this parameter so that all bare
    recipients are accepted as function calls (the heuristic fallback).
    """
    if not recipient:
        return False
    if recipient.startswith("<|"):
        return False
    if recipient.startswith("functions."):
        return len(recipient) > len("functions.")
    if recipient == "assistant":
        return False
    if recipient in BUILTIN_TOOL_TO_MCP_SERVER_LABEL:
        return False
    first_segment = recipient.split(".", 1)[0]
    if first_segment in BUILTIN_TOOL_TO_MCP_SERVER_LABEL:
        return False
    if allowed_function_tool_names is not None:
        return recipient in allowed_function_tool_names
    return True

`parse_chat_input_to_harmony_message(chat_msg, tool_id_names=None)` ¶

Parse a message from request.messages in the Chat Completion API to Harmony messages.

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def parse_chat_input_to_harmony_message(
    chat_msg, tool_id_names: dict[str, str] | None = None
) -> list[Message]:
    """
    Parse a message from request.messages in the Chat Completion API to
    Harmony messages.
    """
    tool_id_names = tool_id_names or {}

    if not isinstance(chat_msg, dict):
        # Handle Pydantic models
        chat_msg = chat_msg.model_dump(exclude_none=True)

    role = chat_msg.get("role")
    msgs: list[Message] = []

    # Assistant message with tool calls
    tool_calls = chat_msg.get("tool_calls", [])

    if role == "assistant" and tool_calls:
        content = flatten_input_text_content(chat_msg.get("content"))
        if content:
            commentary_msg = Message.from_role_and_content(Role.ASSISTANT, content)
            commentary_msg = commentary_msg.with_channel("commentary")
            msgs.append(commentary_msg)

        reasoning = chat_msg.get("reasoning")
        if reasoning:
            analysis_msg = Message.from_role_and_content(Role.ASSISTANT, reasoning)
            analysis_msg = analysis_msg.with_channel("analysis")
            msgs.append(analysis_msg)

        for call in tool_calls:
            func = call.get("function", {})
            name = func.get("name", "")
            arguments = func.get("arguments", "") or ""
            msg = Message.from_role_and_content(Role.ASSISTANT, arguments)
            msg = msg.with_channel("commentary")
            msg = msg.with_recipient(f"functions.{name}")
            # Officially, this should be `<|constrain|>json` but there is not clear
            # evidence that improves accuracy over `json` and some anecdotes to the
            # contrary. Further testing of the different content_types is needed.
            msg = msg.with_content_type("json")
            msgs.append(msg)
        return msgs

    # Tool role message (tool output)
    if role == "tool":
        tool_call_id = chat_msg.get("tool_call_id", "")
        name = tool_id_names.get(tool_call_id, "")
        content = flatten_input_text_content(chat_msg.get("content")) or ""

        msg = (
            Message.from_author_and_content(
                Author.new(Role.TOOL, f"functions.{name}"), content
            )
            .with_channel("commentary")
            .with_recipient("assistant")
        )
        return [msg]

    # Non-tool reasoning content
    reasoning = chat_msg.get("reasoning")
    if role == "assistant" and reasoning:
        analysis_msg = Message.from_role_and_content(Role.ASSISTANT, reasoning)
        analysis_msg = analysis_msg.with_channel("analysis")
        msgs.append(analysis_msg)

    # Default: user/assistant/system messages with content
    content = chat_msg.get("content") or ""
    if content is None:
        content = ""
    if isinstance(content, str):
        contents = [TextContent(text=content)]
    else:
        # TODO: Support refusal.
        contents = [TextContent(text=c.get("text", "")) for c in content]

    # Only add assistant messages if they have content, as reasoning or tool calling
    # assistant messages were already added above.
    if role == "assistant" and contents and contents[0].text:
        msg = Message.from_role_and_contents(role, contents)
        # Send non-tool assistant messages to the final channel
        msg = msg.with_channel("final")
        msgs.append(msg)
    elif role in ("system", "developer"):
        instructions = flatten_input_text_content(chat_msg.get("content"))
        if instructions is not None:
            msg = get_system_or_developer_message(role, instructions)
            msgs.append(msg)
    # For user messages, add them directly even if no content.
    elif role != "assistant":
        msg = Message.from_role_and_contents(role, contents)
        msgs.append(msg)

    return msgs

`parse_chat_inputs_to_harmony_messages(chat_msgs)` ¶

Parse a list of messages from request.messages in the Chat Completion API to Harmony messages.

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def parse_chat_inputs_to_harmony_messages(chat_msgs: list) -> list[Message]:
    """
    Parse a list of messages from request.messages in the Chat Completion API to
    Harmony messages.
    """
    msgs: list[Message] = []
    tool_id_names: dict[str, str] = {}

    # Collect tool id to name mappings for tool response recipient values
    for chat_msg in chat_msgs:
        for tool_call in chat_msg.get("tool_calls", []):
            tool_id_names[tool_call.get("id")] = tool_call.get("function", {}).get(
                "name"
            )

    for chat_msg in chat_msgs:
        msgs.extend(parse_chat_input_to_harmony_message(chat_msg, tool_id_names))

    msgs = auto_drop_analysis_messages(msgs)
    return msgs

`parse_chat_output(token_ids)` ¶

Parse the output of a Harmony chat completion into reasoning and final content. Note that when the openai tool parser is used, serving_chat only uses this for the reasoning content and gets the final content from the tool call parser.

When the openai tool parser is not enabled, or when GptOssReasoningParser is in use,this needs to return the final content without any tool calls parsed.

Empty reasoning or final content is returned as None instead of an empty string.

Source code in vllm/entrypoints/openai/parser/harmony_utils.py

def parse_chat_output(
    token_ids: Sequence[int],
) -> tuple[str | None, str | None, bool]:
    """
    Parse the output of a Harmony chat completion into reasoning and final content.
    Note that when the `openai` tool parser is used, serving_chat only uses this
    for the reasoning content and gets the final content from the tool call parser.

    When the `openai` tool parser is not enabled, or when `GptOssReasoningParser` is
    in use,this needs to return the final content without any tool calls parsed.

    Empty reasoning or final content is returned as None instead of an empty string.
    """
    parser = parse_output_into_messages(token_ids)
    output_msgs = parser.messages
    is_tool_call = False  # TODO: update this when tool call is supported

    # Get completed messages from the parser
    # - analysis channel: hidden reasoning
    # - commentary channel without recipient (preambles): visible to user
    # - final channel: visible to user
    # - commentary with recipient (tool calls): handled separately by tool parser
    reasoning_texts = [
        msg.content[0].text for msg in output_msgs if msg.channel == "analysis"
    ]
    final_texts = [
        msg.content[0].text
        for msg in output_msgs
        if msg.channel == "final" or (msg.channel == "commentary" and not msg.recipient)
    ]

    # Extract partial messages from the parser
    if parser.current_channel == "analysis" and parser.current_content:
        reasoning_texts.append(parser.current_content)
    elif parser.current_channel == "final" and parser.current_content:
        final_texts.append(parser.current_content)
    elif (
        parser.current_channel == "commentary"
        and not parser.current_recipient
        and parser.current_content
    ):
        # Preambles (commentary without recipient) are visible to user
        final_texts.append(parser.current_content)

    # Flatten multiple messages into a single string
    reasoning: str | None = "\n".join(reasoning_texts)
    final_content: str | None = "\n".join(final_texts)

    # Return None instead of empty string since existing callers check for None
    reasoning = reasoning or None
    final_content = final_content or None

    return reasoning, final_content, is_tool_call