Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint interface for inpainting models that require two images. #282

Open
OrderAndCh4oS opened this issue May 30, 2023 · 5 comments
Open

Comments

@OrderAndCh4oS
Copy link

Is your feature request related to a problem? Please describe.
There doesn't appear to be an endpoint interface for Stable Diffusion inpainting models that require two image files, the base image and a mask.

Describe the solution you'd like
It would be handy to have an interface for these models so that the Hosted inference API widget would work on the model card views.

Describe alternatives you've considered
I recently had to create an endpoint for ControlNet inpainting: OrderAndChaos/controlnet-inpaint-endpoint. This was based on the philschmid/stable-diffusion-2-inpainting-endpoint endpoint.

Additional Context
Originally asked here: huggingface/huggingface_hub#1486

@Narsil
Copy link
Contributor

Narsil commented Jun 5, 2023

@pcuenca @mishig25

This seems like a valid use case, wdyt ? Any models particularly fit for that ?
Maybe we should consider some highly diffuser specific component maybe (akin to Adobe ?) that could be much more general (trying to limit the number of different widgets/pipelines)

AFAIK there's:

  • Inpainting
  • ControlNet (lots of possibles additional masks/information)
  • Outpainting
  • Prompt + Prompt parsing
  • Negative prompt

I'm not saying to say we should support everything, I'm listing things I'm aware of that could be nice to add so we can try to think a single (at most a pair) of widgets that try to handle as many cases as possible.

@pcuenca
Copy link
Member

pcuenca commented Jun 5, 2023

There are some specialist in-painting models, but by and large most stable diffusion models are capable of many of those tasks @Narsil outlined, at least to some extent.

Perhaps we could generalize to some sort of text_plus_image_to_image task, where the nature of the input image depends on the particular model used (it could be a mask for in-painting, or a regular image for image-to-image generation, ...) Even so, it sounds tricky to cover all the flexibility used in methods such as ControlNet.

Is the final goal here a richer representation of ControlNet?

@Narsil
Copy link
Contributor

Narsil commented Jun 6, 2023

Is the final goal here a richer representation of ControlNet?

I'm guessing the final goal is to showcase as best as possible what models can do.
There's definitely a tradeoff between showcasing everything, and the bare minimum.

Currently it's the bare minimum lacking some potential to show off some super nice properties. Spaces can handle arbitrarily complex interfaces.
I'm mostly raising the question so we can think if something better than the current widget is possible

@pcuenca
Copy link
Member

pcuenca commented Jun 6, 2023

Yes, of course!

My question was more about what models should be tagged with these new widgets vs just with the text-to-image task. If I understand it correctly, the widget to show is determined based on the pipeline_task, so it's a 1-to-1 relationship, is that right?

cc @apolinario, he always has good ideas about these things

@Narsil
Copy link
Contributor

Narsil commented Jun 7, 2023

, so it's a 1-to-1 relationship, is that right?

Indeed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants