Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add model context information to chat template #8869

Open
1 task done
maxdebayser opened this issue Sep 26, 2024 · 0 comments
Open
1 task done

[Feature]: Add model context information to chat template #8869

maxdebayser opened this issue Sep 26, 2024 · 0 comments

Comments

@maxdebayser
Copy link
Contributor

🚀 The feature, motivation and pitch

I'm currently working on tool use PRs and I'm seeing that some models are very sensitive to the given prompt. So it would be nice to be able to detect what model is being used in the chat template and adjust the input accordingly.

For example, in LLama 3.1 the model seems to perform better if the tool list is passed in the first user message, whereas Llama 3.2 seems to prefer the tools to be in the system prompt. In the llama chat template this behavior is controlled by the tools_in_user_message flag that can be passed in the tokenizer.apply_chat_template() call:

{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = false %}
{%- endif %}

Passing extra flags is already supported in the vllm's version of the OpenAI API using the chat_template_kwargs field in the request JSON, but this is not supported in the openai client library, making it hard to use. Therefore, it would be nice if we could have extra context inserted in the chat template to conditionally create different prompts.

To illustrate the idea, here is a simplistic PoC that adds the model name as a variable passed to the chat template:

$ git diff vllm/entrypoints/openai/serving_chat.py
diff --git a/vllm/entrypoints/openai/serving_chat.py b/vllm/entrypoints/openai/serving_chat.py
index eee8076b..69e21511 100644
--- a/vllm/entrypoints/openai/serving_chat.py
+++ b/vllm/entrypoints/openai/serving_chat.py
@@ -132,6 +132,9 @@ class OpenAIServingChat(OpenAIServing):
                 tool.model_dump() for tool in request.tools
             ]
 
+            chat_template_kwargs = request.chat_template_kwargs or {}
+            chat_template_kwargs["model"] = request.model
+
             prompt: Union[str, List[int]]
             is_mistral_tokenizer = isinstance(tokenizer, MistralTokenizer)
             if is_mistral_tokenizer:
@@ -142,7 +145,7 @@ class OpenAIServingChat(OpenAIServing):
                     add_generation_prompt=request.add_generation_prompt,
                     tools=tool_dicts,
                     documents=request.documents,
-                    **(request.chat_template_kwargs or {}),
+                    **chat_template_kwargs,
                 )
             else:
                 prompt = apply_hf_chat_template(
@@ -152,7 +155,7 @@ class OpenAIServingChat(OpenAIServing):
                     add_generation_prompt=request.add_generation_prompt,
                     tools=tool_dicts,
                     documents=request.documents,
-                    **(request.chat_template_kwargs or {}),
+                    **chat_template_kwargs,
                 )
         except Exception as e:
             logger.error("Error in applying chat template from request: %s", e)

And then the chat template can do stuff like this:

{%- if not tools_in_user_message is defined %}
    {%- if model is defined and "3.1" in model  %}
        {%- set tools_in_user_message = true %}
    {%- else %}
        {%- set tools_in_user_message = false %}
    {%- endif %}
{%- endif %}

But ideally it would be something more robust than the example above.

cc: @njhill @K-Mistele

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant