Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some InferenceClient tasks missing parameters argument, inconsistent with task specifications #2557

Open
hanouticelina opened this issue Sep 20, 2024 · 7 comments · May be fixed by #2561
Open
Assignees

Comments

@hanouticelina
Copy link
Contributor

Some task methods in the huggingface_hub.InferenceClient do not include a parameters argument to allow passing additional inference params.
The tasks are : audio-classification, automatic-speech-recognition, fill-mask, image-classification, image-segmentation, object-detection, question-answering, table-question-answering, text-classification, token-classification and translation.
This inconsistency makes the implementation not fully aligned with the task specs here and the documentation here.

taking the example of text-classification:

with python requests:

import os

import requests

from huggingface_hub import InferenceClient


API_URL = "https://api-inference.huggingface.co/models/distilbert/distilbert-base-uncased-finetuned-sst-2-english"
headers = {"Authorization": f"Bearer {os.environ.get('HF_TOKEN')}"}


def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()


output = query(
    {
        "inputs": "I like you. I love you",
        "parameters": {"top_k": 1},
    }
)

print(output)
[{'label': 'POSITIVE', 'score': 0.9998738765716553}]

With InferenceClient

client = InferenceClient()

output = client.text_classification(
    model="distilbert-base-uncased-finetuned-sst-2-english",
    text="I like you. I love you",
    parameters={"top_k": 1},
)

print(output)
output = client.text_classification(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: InferenceClient.text_classification() got an unexpected keyword argument 'parameters'
@hanouticelina hanouticelina self-assigned this Sep 20, 2024
@Wauplin
Copy link
Contributor

Wauplin commented Sep 20, 2024

Yes that is an identified problem. For context, when we first implemented the InferenceClient, we didn't have that much harmonization yet (specs didn't exists). To avoid future maintenance we decided not to add them in a lot of tasks -always better to do less and avoid future breaking changes-. That's why the tasks you've listed above don't have parameters.

That been said, if we start adding parameters, we most likely would add a parameters parameter to the method signatures but individual parameters instead (e.g. client.text_classification(..., top_k=1)).

EDIT: another part to take into consideration is that those tasks are much much less used than the text generation or image generation ones, hence the reduced scope.

@hanouticelina
Copy link
Contributor Author

Agree, I prefer too adding a parameters argument instead of individual parameters, at least for those "less popular" tasks. Despite having low usage of those tasks though, I think it's still important to have the client aligned with the Inference API documentation.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 20, 2024

Sorry sorry, my bad 🙈

I meant we most likely wouldn't add a parameters parameter to the method signatures but individual parameters instead.

The problem of a parameters parameter is that it doesn't play well with annotations/autocomplete. Even if we start using typing.TypedDict as OAI does, it's still not as natively supported as kwargs.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 20, 2024

I think it's still important to have the client aligned with the Inference API documentation.

Makes sense now that we have a proper documentation 👍

@Wauplin
Copy link
Contributor

Wauplin commented Sep 20, 2024

Since we are talking about ~10 tasks that follows exactly the same pattern (i.e. inputs + parameters), I think we should think of a semi-automated way to do that as well. This is how I would picture it (taking text_classification example):

  • take TextClassificationParameters class that is auto-generated based on the specs
  • take InferenceClient.text_classification method that is for now entirely manually written
  • check if a parameter is not listed in text_classification method. If a parameter is missing,
    • add it to the method signature with the same type annotation as the dataclass (for instance function_to_apply: Optional["ClassificationOutputTransform"] = None)
    • add the attribute docstring (for instance """When specified, limits the output to the top K most probable classes.""" to the method docstring inside the args section
    • make sure the imports are correct (typically, from ... import ClassificationOutputTransform)

With such a script, we would be sure all those "less used methods" are still well maintained according to the specs. In the past, I've created a few script to auto-generate things (see the ./utils folder), always based on static analysis i.e. only working on the module text. For such a script here, I think we'll need to work at the ast level.

It's usually good for such a script to have a "check" mode (fails if something has to be changed) and a "format" mode (modify the file inplace). Once we have it, we can integrate it to the makefile / CI. Let me know what you think :)

@hanouticelina
Copy link
Contributor Author

Sorry sorry, my bad 🙈

I meant we most likely wouldn't add a parameters parameter to the method signatures but individual parameters instead.

The problem of a parameters parameter is that it doesn't play well with annotations/autocomplete. Even if we start using typing.TypedDict as OAI does, it's still not as natively supported as kwargs.

Yes, I was actually thinking about adding parameters to avoid modifying the client code if there are future changes in the parameters on the API side. But makes sense to add individual parameters, I don't think there will be much changes so let's do that.

Since we are talking about ~10 tasks that follows exactly the same pattern (i.e. inputs + parameters), I think we should think of a semi-automated way to do that as well. This is how I would picture it (taking text_classification example):

  • take TextClassificationParameters class that is auto-generated based on the specs

  • take InferenceClient.text_classification method that is for now entirely manually written

  • check if a parameter is not listed in text_classification method. If a parameter is missing,

    • add it to the method signature with the same type annotation as the dataclass (for instance function_to_apply: Optional["ClassificationOutputTransform"] = None)
    • add the attribute docstring (for instance """When specified, limits the output to the top K most probable classes.""" to the method docstring inside the args section
    • make sure the imports are correct (typically, from ... import ClassificationOutputTransform)

With such a script, we would be sure all those "less used methods" are still well maintained according to the specs. In the past, I've created a few script to auto-generate things (see the ./utils folder), always based on static analysis i.e. only working on the module text. For such a script here, I think we'll need to work at the ast level.

It's usually good for such a script to have a "check" mode (fails if something has to be changed) and a "format" mode (modify the file inplace). Once we have it, we can integrate it to the makefile / CI. Let me know what you think :)

Yes, I like the idea, that will be cool to have this in the makefile/CI.
I think the script should:

  • Focus solely on adding new missing parameters
  • Not handle parameter renaming or deprecation. For example, if function_to_apply is renamed to fn_to_apply, we'd still need to manually keep the old name and deprecate it. The idea is really to make the script simple with not a lot of maintenance to do.

I will open later a separate issue for this 👍

@Wauplin
Copy link
Contributor

Wauplin commented Sep 20, 2024

Not handle parameter renaming or deprecation. For example, if function_to_apply is renamed to fn_to_apply, we'd still need to manually keep the old name and deprecate it. The idea is really to make the script simple with not a lot of maintenance to do.

Agree 👍 And if it's too annoying in future maintenance, we can complete the script later.

I will open later a separate issue for this 👍

As you wish, it's also fine to start a PR directly or rename this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants