[Feature]: Add model context information to chat template #8869

maxdebayser · 2024-09-26T21:09:30Z

🚀 The feature, motivation and pitch

I'm currently working on tool use PRs and I'm seeing that some models are very sensitive to the given prompt. So it would be nice to be able to detect what model is being used in the chat template and adjust the input accordingly.

For example, in LLama 3.1 the model seems to perform better if the tool list is passed in the first user message, whereas Llama 3.2 seems to prefer the tools to be in the system prompt. In the llama chat template this behavior is controlled by the tools_in_user_message flag that can be passed in the tokenizer.apply_chat_template() call:

{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = false %}
{%- endif %}

Passing extra flags is already supported in the vllm's version of the OpenAI API using the chat_template_kwargs field in the request JSON, but this is not supported in the openai client library, making it hard to use. Therefore, it would be nice if we could have extra context inserted in the chat template to conditionally create different prompts.

To illustrate the idea, here is a simplistic PoC that adds the model name as a variable passed to the chat template:

$ git diff vllm/entrypoints/openai/serving_chat.py
diff --git a/vllm/entrypoints/openai/serving_chat.py b/vllm/entrypoints/openai/serving_chat.py
index eee8076b..69e21511 100644
--- a/vllm/entrypoints/openai/serving_chat.py
+++ b/vllm/entrypoints/openai/serving_chat.py
@@ -132,6 +132,9 @@ class OpenAIServingChat(OpenAIServing):
                 tool.model_dump() for tool in request.tools
             ]
 
+            chat_template_kwargs = request.chat_template_kwargs or {}
+            chat_template_kwargs["model"] = request.model
+
             prompt: Union[str, List[int]]
             is_mistral_tokenizer = isinstance(tokenizer, MistralTokenizer)
             if is_mistral_tokenizer:
@@ -142,7 +145,7 @@ class OpenAIServingChat(OpenAIServing):
                     add_generation_prompt=request.add_generation_prompt,
                     tools=tool_dicts,
                     documents=request.documents,
-                    **(request.chat_template_kwargs or {}),
+                    **chat_template_kwargs,
                 )
             else:
                 prompt = apply_hf_chat_template(
@@ -152,7 +155,7 @@ class OpenAIServingChat(OpenAIServing):
                     add_generation_prompt=request.add_generation_prompt,
                     tools=tool_dicts,
                     documents=request.documents,
-                    **(request.chat_template_kwargs or {}),
+                    **chat_template_kwargs,
                 )
         except Exception as e:
             logger.error("Error in applying chat template from request: %s", e)

And then the chat template can do stuff like this:

{%- if not tools_in_user_message is defined %}
    {%- if model is defined and "3.1" in model  %}
        {%- set tools_in_user_message = true %}
    {%- else %}
        {%- set tools_in_user_message = false %}
    {%- endif %}
{%- endif %}

But ideally it would be something more robust than the example above.

cc: @njhill @K-Mistele

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

maxdebayser added the feature request label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add model context information to chat template #8869

[Feature]: Add model context information to chat template #8869

maxdebayser commented Sep 26, 2024

[Feature]: Add model context information to chat template #8869

[Feature]: Add model context information to chat template #8869

Comments

maxdebayser commented Sep 26, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...