Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to have a tool return image bytes? #10

Open
Taytay opened this issue Jun 23, 2024 · 3 comments
Open

Is there a way to have a tool return image bytes? #10

Taytay opened this issue Jun 23, 2024 · 3 comments
Labels
good first issue Good for newcomers

Comments

@Taytay
Copy link

Taytay commented Jun 23, 2024

First, thanks for claudette. It's very elegant.

toolslm doesn't appear to map the "bytes" type to anything, which I guess makes sense.
However, that means I'm having a hard time using the toolloop if I include my image returning function.

def get_image_of_puppy() -> bytes:
            "Returns an image of a puppy"
            image: Path = Path("puppy.jpg")
            return image.read_bytes()

tools = [get_image_of_puppy]

chat = Chat(model, tools=tools)
r = chat.toolloop("Describe the puppy image")
print(contents(r))

Error:

Traceback (most recent call last):
  File "/Users/taytay/projects/llm-browser-driver/src/llm_browser_driver/playground/./test_claude.py", line 28, in <module>
    r = chat.toolloop("Describe the puppy image")
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/claudette/toolloop.py", line 23, in toolloop
    r = self(pr, **kwargs)
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/claudette/core.py", line 200, in __call__
    if self.tools: kw['tools'] = [get_schema(o) for o in self.tools]
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/claudette/core.py", line 200, in <listcomp>
    if self.tools: kw['tools'] = [get_schema(o) for o in self.tools]
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/toolslm/funccall.py", line 43, in get_schema
    if ret.anno is not empty: desc += f'\n\nReturns:\n- type: {_types(ret.anno)[0]}'
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/toolslm/funccall.py", line 20, in _types
    else: return tmap[t], None
KeyError: <class 'bytes'>

Which of course comes from:

# %% ../01_funccall.ipynb 11
def _types(t:type)->tuple[str,Optional[str]]:
    "Tuple of json schema type name and (if appropriate) array item name."
    if t is empty: raise TypeError('Missing type')
    tmap = {int:"integer", float:"number", str:"string", bool:"boolean", list:"array", dict:"object"}
    if getattr(t, '__origin__', None) in  (list,tuple): return "array", tmap.get(t.__args__[0], "object")
    else: return tmap[t], None

I tried returning a dict of the image type, but claude complains there were too many input tokens. I need to get the image to get sent as a group of bytes back into the chat so its machinery kicks in and converts it to an image that Claude understands:

def img_msg(data:bytes)->dict:
    "Convert image `data` into an encoded `dict`"
    img = base64.b64encode(data).decode("utf-8")
    mtype = mimetypes.types_map['.'+imghdr.what(None, h=data)]
    r = dict(type="base64", media_type=mtype, data=img)
    return {"type": "image", "source": r}

I could probably hack it together, but I keep feeling like I'm missing something obvious?

@jph00
Copy link
Contributor

jph00 commented Jun 24, 2024

Are there any examples in the Anthropic docs of a tool returning something that's not a string? If so, please provide a link and we'll try to get it working. Or alternatively, if you're able to show an example that works with the plain Anthropic class, link to a gist or repo so we can see what's needed.

@Taytay
Copy link
Author

Taytay commented Jun 24, 2024

I think this is the example from their docs that covers returning an image:
https://docs.anthropic.com/en/docs/build-with-claude/tool-use#example-of-tool-result-with-images

Excerpt


  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": [
        {"type": "text", "text": "15 degrees"},
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "/9j/4AAQSkZJRg...",
          }
        }
      ]
    }
  ]
}

I'll try to hack on this more too when I get a chance and will let you know if I find a good way.

@Taytay
Copy link
Author

Taytay commented Jun 24, 2024

Okay - got it working. (Thanks to @patch for making this easy!)

Feel free to make this more elegant! :)

import inspect
import logging
import os

os.environ["ANTHROPIC_LOG"] = "debug"

from pathlib import Path
from typing import Optional

import claudette.core
import toolslm.funccall
from claudette import Chat, contents
from claudette.core import ToolUseBlock, _mk_ns, abc, img_msg
from fastcore.utils import patch_to

empty = inspect.Parameter.empty


@patch_to(toolslm.funccall)
def _types(t: type) -> tuple[str, Optional[str]]:
    "Tuple of json schema type name and (if appropriate) array item name."
    if t is empty:
        raise TypeError("Missing type")
    tmap = {
        int: "integer",
        float: "number",
        str: "string",
        bool: "boolean",
        list: "array",
        dict: "object",
        # Bytes is assumed to be an image for now
        # We could likely add a better type to indicate this
        bytes: {
            "type": "object",
            "properties": {
                "type": {"type": "string", "enum": ["image"]},
                "source": {
                    "type": "object",
                    "properties": {
                        "type": {"type": "string", "enum": ["base64"]},
                        "media_type": {"type": "string"},
                        "data": {"type": "string"},
                    },
                    "required": ["type", "media_type", "data"],
                },
            },
        },
    }
    if getattr(t, "__origin__", None) in (list, tuple):
        return "array", tmap.get(t.__args__[0], "object")
    else:
        return tmap[t], None


@patch_to(claudette.core)
def call_func(fc: ToolUseBlock, ns: Optional[abc.Mapping] = None, obj: Optional = None):
    "Call the function in the tool response `tr`, using namespace `ns`."
    if ns is None:
        ns = globals()
    if not isinstance(ns, abc.Mapping):
        ns = _mk_ns(*ns)
    func = getattr(obj, fc.name, None)
    if not func:
        func = ns[fc.name]
    res = func(**fc.input)
    if isinstance(res, bytes):
        # If the result is bytes, assume it's an image
        return dict(type="tool_result", tool_use_id=fc.id, content=[img_msg(res)])
    return dict(type="tool_result", tool_use_id=fc.id, content=str(res))


def get_image_of_puppy() -> bytes:
    "Returns an image of a puppy"
    image: Path = Path("samples/puppy.jpeg")
    return image.read_bytes()


def get_object_and_properties() -> dict:
    "Returns a dict with a couple of integer properties called x and y"
    return {"x": 1, "y": 2}


def get_str() -> str:
    "Returns a random string"
    return "foo!"


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    tools = [get_image_of_puppy, get_object_and_properties, get_str]

    chat = Chat("claude-3-5-sonnet-20240620", tools=tools)
    r = chat.toolloop(
        "Tell me what tools you have access to please, and what you expect each of them to return to you. Then, examine and describe the puppy"
    )
    print(contents(r))

Full gist of response here:
https://gist.github.com/Taytay/7191d5f5722d3ed8c000a938e11b26cd

@jph00 jph00 added the good first issue Good for newcomers label Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants