Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] The batch generation in MovieLens example produces batches in an unexpected way #1101

Open
zainkhan-afk opened this issue Jul 2, 2024 · 2 comments
Labels
question Further information is requested

Comments

@zainkhan-afk
Copy link

zainkhan-afk commented Jul 2, 2024

❓ Questions & Help

Details

I was running the Getting Started With MovieLens example for pytorch and when I created the dataloader and generated a single batch I got a different output than the the one that was expected:

I got:

({'userId': tensor([ 8528, 39453, 50328,  ..., 59406, 59579, 12128], device='cuda:0'),
  'movieId': tensor([1175,  387,   12,  ...,   23,  934, 1738], device='cuda:0'),
  'genres__values': tensor([5, 6, 5,  ..., 9, 5, 3], device='cuda:0'),
  'genres__offsets': tensor([    0,     2,     4,  ..., 88830, 88833, 88835], device='cuda:0',
         dtype=torch.int32)},
 tensor([0., 0., 1.,  ..., 1., 1., 1.], device='cuda:0'))

Expected:

({'genres': (tensor([1, 2, 6,  ..., 8, 1, 4], device='cuda:0'),
   tensor([[    0],
           [    1],
           [    3],
           ...,
           [88555],
           [88556],
           [88557]], device='cuda:0', dtype=torch.int32)),
  'userId': tensor([[1691],
          [1001],
          [ 967],
          ...,
          [ 848],
          [1847],
          [5456]], device='cuda:0'),
  'movieId': tensor([[ 332],
          [ 154],
          [ 245],
          ...,
          [3095],
          [1062],
          [3705]], device='cuda:0')},
 tensor([1., 1., 0.,  ..., 1., 1., 0.], device='cuda:0'))

Docker: nvcr.io/nvidia/merlin/merlin-pytorch:nightly
Notebook: 03-Training-with-PyTorch.ipynb

I read in the documentation that a multicoded object will have two tensors (value and nnzs), in my case I the two tensors are being added against different keys rather than being added as a tuple against a single key.

How can I get the batches in the required format?

@zainkhan-afk zainkhan-afk added the question Further information is requested label Jul 2, 2024
@zainkhan-afk zainkhan-afk changed the title [QST] The batch generation in MovieLens example produces batches in an unexpected way Jul 2, 2024
@zainkhan-afk zainkhan-afk changed the title The batch generation in MovieLens example produces batches in an unexpected way [QST] The batch generation in MovieLens example produces batches in an unexpected way Jul 2, 2024
@zainkhan-afk
Copy link
Author

I could not find a solution to this, so I just downloaded the tensorflow docker and ran the tensorflow notebook instead.

@CarloNicolini
Copy link

In my experience, working with the merlin-torch version is bad. Tensorflow has a much better support, I see the merlin-torch as still very immature.
I had many problems in replicating the tensorflow-based notebooks with Merlin torch, as as you I've abandoned the idea and happily switched to merlin-tensorflow containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants