feat: Modifying schema to support multi modal inputs. #1673

rohit-rptless · 2024-08-20T23:11:46Z

feat: support for multi-modal input - work in progress - schema change

Based on the interface that OpenAI provides.
User can provide a list of
[{"type": "", "image_url": ""}]
that gets passed to the model.

Please describe the purpose of this pull request.
Modifying schema for adding Multimodal support - User can give text and image.

How to test
Not testable. Only for reviewing high level schema changes. Won't be merged without more commits.

Have you tested this PR?
No.

Based on the interface that OpenAI provides. User can provide a list of [{"type": "", "image_url": ""}] that gets passed to the model.

cpacker · 2024-08-20T23:42:35Z

memgpt/agent_store/db.py

@@ -156,6 +156,7 @@ class MessageModel(Base):
            # openai info
            role = Column(String, nullable=False)
            text = Column(String)  # optional: can be null if function call


Here we should probably delete
text = Column(String) # optional: can be null if function call

and replace with
content = Column(JSON) # optional: multi-modal input

which in the pydantic model is Optional[Union[str, List[MultiModalMessagePart]]]

but in the database itself is stored as an optional JSON field

cpacker · 2024-08-20T23:43:28Z

memgpt/agent_store/db.py

@@ -192,6 +193,7 @@ def to_record(self):
                    role=self.role,
                    name=self.name,
                    text=self.text,


Similarly, should delete/deprecate text and replace with content which can be None, str (text-only), or List[dict/MultiModalMessagePart] (multi-modal).

cpacker · 2024-08-20T23:43:50Z

memgpt/schemas/message.py

@@ -62,6 +62,8 @@ class Message(BaseMessage):
    id: str = BaseMessage.generate_id_field()
    role: MessageRole = Field(..., description="The role of the participant.")
    text: str = Field(..., description="The text of the message.")
+    # Field mm_content is only used when role is 'user'. It needs to be mapped to MultiModalMessage


Same comment here - deprecating text and replacing with a new content that matches OpenAI

cpacker · 2024-08-20T23:44:08Z

memgpt/schemas/message.py

@@ -223,8 +225,9 @@ def to_openai_dict(

        elif self.role == "user":
            assert all([v is not None for v in [self.text, self.role]]), vars(self)
+            content = self.mm_content if self.mm_content is not None else self.text


Once we do the above (replace text) we should no longer need this

cpacker · 2024-08-20T23:44:57Z

memgpt/schemas/openai/chat_completion_request.py

@@ -8,13 +8,15 @@ class SystemMessage(BaseModel):
    role: str = "system"
    name: Optional[str] = None

+class MultiModalMessage(BaseModel):


We probably need to expand this to have text and image_url:

Removing the text field. Adding content instead of mm_content.

rohit-rptless added 2 commits August 20, 2024 15:51

Modifying schema to support multi modal inputs.

79278ea

Based on the interface that OpenAI provides. User can provide a list of [{"type": "", "image_url": ""}] that gets passed to the model.

Adding a field into the schema.

8f57be8

cpacker reviewed Aug 20, 2024

View reviewed changes

Changes based on direct feedback.

7657d41

Removing the text field. Adding content instead of mm_content.

cpacker changed the title ~~Modifying schema to support multi modal inputs.~~ refactor: Modifying schema to support multi modal inputs. Aug 20, 2024

cpacker changed the title ~~refactor: Modifying schema to support multi modal inputs.~~ feat: Modifying schema to support multi modal inputs. Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Modifying schema to support multi modal inputs. #1673

feat: Modifying schema to support multi modal inputs. #1673

rohit-rptless commented Aug 20, 2024 •

edited

Loading

cpacker Aug 20, 2024

cpacker Aug 20, 2024

cpacker Aug 20, 2024

cpacker Aug 20, 2024

cpacker Aug 20, 2024

feat: Modifying schema to support multi modal inputs. #1673

Are you sure you want to change the base?

feat: Modifying schema to support multi modal inputs. #1673

Conversation

rohit-rptless commented Aug 20, 2024 • edited Loading

cpacker Aug 20, 2024

Choose a reason for hiding this comment

cpacker Aug 20, 2024

Choose a reason for hiding this comment

cpacker Aug 20, 2024

Choose a reason for hiding this comment

cpacker Aug 20, 2024

Choose a reason for hiding this comment

cpacker Aug 20, 2024

Choose a reason for hiding this comment

rohit-rptless commented Aug 20, 2024 •

edited

Loading