Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Modifying schema to support multi modal inputs. #1673

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rohit-rptless
Copy link

@rohit-rptless rohit-rptless commented Aug 20, 2024

feat: support for multi-modal input - work in progress - schema change

Based on the interface that OpenAI provides.
User can provide a list of
[{"type": "", "image_url": ""}]
that gets passed to the model.

Please describe the purpose of this pull request.
Modifying schema for adding Multimodal support - User can give text and image.

How to test
Not testable. Only for reviewing high level schema changes. Won't be merged without more commits.

Have you tested this PR?
No.

Based on the interface that OpenAI provides.
User can provide a list of
[{"type": "", "image_url": ""}]
that gets passed to the model.
@@ -156,6 +156,7 @@ class MessageModel(Base):
# openai info
role = Column(String, nullable=False)
text = Column(String) # optional: can be null if function call
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should probably delete
text = Column(String) # optional: can be null if function call

and replace with
content = Column(JSON) # optional: multi-modal input

which in the pydantic model is Optional[Union[str, List[MultiModalMessagePart]]]

but in the database itself is stored as an optional JSON field

@@ -192,6 +193,7 @@ def to_record(self):
role=self.role,
name=self.name,
text=self.text,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, should delete/deprecate text and replace with content which can be None, str (text-only), or List[dict/MultiModalMessagePart] (multi-modal).

@@ -62,6 +62,8 @@ class Message(BaseMessage):
id: str = BaseMessage.generate_id_field()
role: MessageRole = Field(..., description="The role of the participant.")
text: str = Field(..., description="The text of the message.")
# Field mm_content is only used when role is 'user'. It needs to be mapped to MultiModalMessage
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here - deprecating text and replacing with a new content that matches OpenAI

@@ -223,8 +225,9 @@ def to_openai_dict(

elif self.role == "user":
assert all([v is not None for v in [self.text, self.role]]), vars(self)
content = self.mm_content if self.mm_content is not None else self.text
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we do the above (replace text) we should no longer need this

@@ -8,13 +8,15 @@ class SystemMessage(BaseModel):
role: str = "system"
name: Optional[str] = None

class MultiModalMessage(BaseModel):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to expand this to have text and image_url:

image

Removing the text field.
Adding content instead of mm_content.
@cpacker cpacker changed the title Modifying schema to support multi modal inputs. refactor: Modifying schema to support multi modal inputs. Aug 20, 2024
@cpacker cpacker changed the title refactor: Modifying schema to support multi modal inputs. feat: Modifying schema to support multi modal inputs. Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

2 participants