Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Is there a way to turn weave objects (tracetables, boxed types, etc....) back into the original python objects? #1602

Open
darinkishore opened this issue May 4, 2024 · 5 comments
Labels
backlog The issue has been triaged and added to our backlog for prioritization. enhancement New feature or request

Comments

@darinkishore
Copy link

darinkishore commented May 4, 2024

Hi! So I have a weave object, initialized as follows:

class SemanticMemoryExample(BaseModel):
    name: str
    text: str
    memory: str
    inputs: list[str] = ["name", "text"]


class SemanticMemoryExampleDataset(weave.Object, BaseModel):
    name: str = "semantic_memory_perltmem_dspy_first_examples"
    description: str = "First foray into semantic memory using DSPY"
    examples: list[SemanticMemoryExample]

The wrapper "SemanticMemoryExampleDataset" is being used because name is an already defined attribute for weave.Object.

I'd like to save and load this object, so I run

def publish_dataset(
    examples: list[SemanticMemoryExample],
) -> SemanticMemoryExampleDataset:
    dataset = SemanticMemoryExampleDataset(examples=examples)
    weave.publish(dataset)
    return dataset
def retrieve_examples(
    dataset_ref: SemanticMemoryExampleDataset,
) -> list[SemanticMemoryExample]:
    retrieved_examples: list[SemanticMemoryExample] = []
    for example in dataset_ref.examples:
        retrieved_examples.append(example)  # here, this doesn't go recursively

    return retrieved_examples

However, retrieve_examples returns

TraceObject(ObjectRecord({'name': BoxedStr('...'), 'text': BoxedStr("...."), 'memory': BoxedStr('...'), 'inputs': TraceList(['name', 'text']), '_class_name': 'SemanticMemoryExample', '_bases': TraceList(['BaseModel']), 'map_values': <bound method ObjectRecord.map_values of ObjectRecord({...})>}))

I see that Boxed objects have an unbox() method, so I can unbox the name, text, and memory by calling from weave.box import unbox and unbox(thing) for thing in list

But I don't see a way to convert inputs, a TraceList back into a native python datatype. Also, the manually converting everything is a hassle—is there a weave function planned (or already existing) that turns the TraceList, TraceTable, etc... objects back into native python datatypes?

Loving the library—there were a LOT of good decisions made as far as what to focus on and DX. This is an almost ideal solution for me.

@darinkishore
Copy link
Author

I'm being silly—You can just cast it back into a list, inputs=list(example.inputs).

@jwlee64
Copy link
Contributor

jwlee64 commented May 7, 2024

Hi @darinkishore just wanted to confirm that all is good here and that we can close this?

@darinkishore
Copy link
Author

Hi! Thank you for checking—My main question is still unresolved!

Can you turn weave objects back into their native python objects?

Usually to preserve state, some classes can't set everything up at creation time!

Also, the changed type of all inside attributes is inconvenient to keep track of and work around in code, especially if I use lots of different weave objects.

@tssweeney
Copy link
Collaborator

Hi @darinkishore - thank you very much for your feedback and comments. Your request is very reasonable. Reading your use case, I am extracting 3 distinct asks:

  1. The ability to construct the original runtime class when loading published data
  2. Ideally Trace*, *Record, and Boxed* type classes are transparent to the user as they deviate from the expected types in code (at the very least it should be easy to recursively strip away this representation)
  3. (Implied from first comment): The special name field in our Object class can conflict with user-defined fields.

Spitballing some API ideas: I wonder if there could be a higher level class method that could make this easier (some pseudo code):

class Object():
   # ...

   @classmethod
   def load(cls, data: "Object" | dict | TraceObject) -> "Object":
      """
      """
      if isinstance(data, cls):
         return data
      elif isinstance(data, dict):
         return cls.model_validate(dict)
      elif isinstance(data, TraceObject):
         return cls.load(weave.unwrap(data))
      else:
          raise

this would allow you to run SemanticMemoryExampleDataset.load(...) to ensure you have the right class.


In any case, these are great requests and we need to think about a good design to improve this. Probably need to come back with more ideas/options before taking action

@tssweeney tssweeney added the enhancement New feature or request label May 20, 2024
@tssweeney
Copy link
Collaborator

Internal backlog link: https://wandb.atlassian.net/browse/WB-18889

@tssweeney tssweeney added the backlog The issue has been triaged and added to our backlog for prioritization. label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog The issue has been triaged and added to our backlog for prioritization. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants