Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response_format with regex does not seem to work #2423

Open
aymeric-roucher opened this issue Jul 26, 2024 · 5 comments
Open

response_format with regex does not seem to work #2423

aymeric-roucher opened this issue Jul 26, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@aymeric-roucher
Copy link

Describe the bug

When passing a response_format of type regex to chat_completion, the output does not always respect the format.

Reproduction

This does not follow the regex:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3.1-8B-Instruct")

output = client.chat_completion([{"role": "user", "content": "ok"}], response_format={"type": "regex", "value": ".+?\n\nCode:+?"})

print(output.choices[0].message.content)

But going through OpenAI Messages API does work:

url = "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-8B-Instruct/v1"

from openai import OpenAI

# init the client but point it to TGI
client = OpenAI(
    base_url=url,
    api_key="*"
)

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "user", "content": "ok"}
    ],
    response_format={"type": "regex", "value": ".+?\n\nCode:+?"}
)
print(chat_completion.choices[0].message.content)

Logs

No response

System info

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.24.2
- Platform: macOS-14.1-arm64-arm-64bit
- Python version: 3.10.14
- Running in iPython ?: Yes
- iPython shell: ZMQInteractiveShell
- Running in notebook ?: Yes
- Running in Google Colab ?: No
- Token path ?: /Users/aymeric/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: m-ric
- Configured git credential helpers: osxkeychain, store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.3.0
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.3.0
- hf_transfer: N/A
- gradio: 4.38.1
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.7.1
- aiohttp: 3.9.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /Users/aymeric/.cache/huggingface/hub
- HF_ASSETS_CACHE: /Users/aymeric/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/aymeric/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@aymeric-roucher aymeric-roucher added the bug Something isn't working label Jul 26, 2024
@aymeric-roucher
Copy link
Author

cc @Wauplin this follows up on #2383.

@Wauplin
Copy link
Contributor

Wauplin commented Jul 30, 2024

Hi @aymeric-roucher, I've made some tests with the reproducible example you've shared. I do think this is a cache issue that has to be fixed either in Inference API or TGI directly. The only difference between OpenAI and InferenceClient clients in your example is that in the first case model="tgi" is passed while in the second model="meta-llama/Meta-Llama-3.1-8B-Instruct" is passed. When I use InferenceClient and also pass model="tgi" (with base_url pointing to Llama 3.1), then the regex works as expected.

I also tested the failing case with "ok 2" instead of "ok" and the answer was correctly using the regex.
Finally, I tested to disable the cache with InferenceClient(..., headers={"x-use-cache": "0"} and the answer was correctly using the regex, which implies the problem comes from an invalid cached answer.

I then tried to reproduce the error by sending a random string twice. First time without a regex (to warm-up the cache) and second time with the regex (to test if it would reuse the cache). I did not manage to reproduce the bug with this technique.

I don't know what is specific with the "ok" message and how it got an invalid answer cached. This is not an huggingace_hub.InferenceClient bug but it still needs to be investigated server-side cc @Narsil @OlivierDehaene

@Narsil
Copy link
Contributor

Narsil commented Jul 30, 2024

The cache key is computed by hashing the entire input (including parameters so including the regex).

This is unlikely to be a cache issue.
The invalid answer might have been cached, but it seems more a case of your regex formatting was ignored.

I think it's possible that some regex can be ignored in some circumstances (basically to avoid critical failure).
This however is not intended behavior.
If you're able to reproduce that'd help a lot.

@Wauplin
Copy link
Contributor

Wauplin commented Jul 30, 2024

@Narsil I've been able to reproduce it with cache disabled by repeating the exact same request until it fails:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3.1-8B-Instruct", headers={"x-use-cache": "0"})

for i in range(50):
    output = client.chat_completion(
        [{"role": "user", "content": "ok"}],
        response_format={"type": "regex", "value": ".+?\n\nCode:+?"},
    )
    answer = output.choices[0].message.content

    if "Code:" in answer:
        print(f"Iteration {i}: OK")
    else:
        print(f"Iteration {i}: NOT OK\n{answer}")
        break

which outputs:

Iteration 0: OK
Iteration 1: OK
Iteration 2: OK
Iteration 3: OK
Iteration 4: OK
Iteration 5: OK
Iteration 6: NOT OK
It seems like you're ready to chat. Is there something specific you'd like to talk about or ask about? I can help with any questions or just have a conversation if you'd like. What's on your mind?  Would you like some suggestions? I could share some interesting topics if you are interested? like how the processor works, Some conspiracy theories or fun facts. let me know! :). There are some space news updates. Or there have been some epilepsy clues about satellites, other

Though it's not reproducing the error 100% of the time, it's still happening once every few requests.

@Narsil
Copy link
Contributor

Narsil commented Jul 30, 2024

Calling in @drbh on this. I know it can happen, I didn't expect 6 iteration would be enough to trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants