Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] "Large" Datasets can't be queried #2566

Open
TeoZosa opened this issue Oct 2, 2024 · 2 comments
Open

[bug] "Large" Datasets can't be queried #2566

TeoZosa opened this issue Oct 2, 2024 · 2 comments

Comments

@TeoZosa
Copy link

TeoZosa commented Oct 2, 2024

Note

I saw #2470 + a few other PRs and it looks like @tssweeney & co are already working on resolving this, but adding the issue as a +1 vote to that ongoing work.

This seems to be due to data size; creating Datasets using toy data seems to work fine even at larger cardinalities.

Steps to reproduce

With Datasets created by duplicating a single example with some pretty verbose string fields and base64 image data, Datasets greater than 33 examples fail to query.

>>> import weave
>>> weave.init(project_name="my-project")
>>> real_dataset = weave.ref("my-real-dataset").get()
>>> dataset = weave.Dataset(name="dataset_size_33", rows = [real_dataset.rows[0]]*33)
>>> weave.publish(dataset)
📦 Published to ...
ObjectRef(entity='...', project='...', name='dataset_size_33', digest='...', extra=())
>>> dataset = weave.Dataset(name="dataset_size_34", rows = [real_dataset.rows[0]]*34)
>>> weave.publish(dataset)
📦 Published to ...
ObjectRef(entity='...', project='...', name='dataset_size_34', digest='...', extra=())

Examples

Real data

>>> import weave
>>> weave.init(project_name="my-project")
>>> len(weave.ref('dataset_size_33').get().rows)
33
>>> len(weave.ref('dataset_size_34').get().rows)
Traceback (most recent call last):
...
    raise requests.HTTPError(
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: /table/query. Reason: 
Click here for screenshots Screenshot 2024-10-02 at 11 25 37 Screenshot 2024-10-02 at 11 25 39 Screenshot 2024-10-02 at 11 32 54 Screenshot 2024-10-02 at 11 32 58 Screenshot 2024-10-02 at 11 25 45

Toy data

Confirmed working even at 100 examples

>>> import weave
>>> weave.init(project_name="my-project")
>>> len(weave.ref('dataset_size_100:v0').get().rows)
100
Click here for screenshots Screenshot 2024-10-02 at 11 19 24
@TeoZosa TeoZosa changed the title [bug] "Large" Datasets can't be stored [bug] "Large" Datasets can't be queried Oct 2, 2024
@jamie-rasmussen
Copy link
Collaborator

Tracking internally as https://wandb.atlassian.net/browse/WB-21344

@tssweeney
Copy link
Collaborator

Thanks @TeoZosa - you are indeed correct that we are investing in this path - next up is to convert our response to a streaming response which will avoid this bug. Will keep you posted on priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants