SasRec tutorial #186

spirinamayya · 2024-09-03T10:17:52Z

Added sasrec tutorial

blondered · 2024-09-06T06:30:27Z

What does os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8" do? Let's write a comment

blondered · 2024-09-06T06:30:52Z

Let's process data exactly the same as in baselines tutorial, not changes in code or outputs

blondered · 2024-09-06T06:43:21Z

The image of the model is too small, it's barely readable. Let's please make it more user friendly.
Also we need to make in MTS boards, since Miro is gonna be off soon.
Loss should be dropped from the scheme because we are going to support a bunch of losses with the same model architecture.
And we can drop timeline mask because we plan to stop multiplying on it and start adding it as a mask to multi-head-attention together with the causal mask.

blondered · 2024-09-06T07:01:05Z

Let's make the following structure of the tutorial (which is different from baselines tutorial because we have much more information for user):

Prepare data
Model description (just a few paragraphs and maybe an image from the paper, no preprocessing for now). Let's write that it's a causal model in contrast with BERT4Rec. I can give you some papers to look at, we will also add them to links.
Recommendations (one paragraph, maybe an image). Just to give the overall idea of the model
RecTools implementation (one paragraph on our features). Now we write that we did exactly the authors architecture. We give your image of the net here. We write about supported losses here (in contrast to original model for now we have only cross-entropy loss but we will support other variants. And everything that is in your "additional details" section goes here
Model application. You're good here. But let's pick a user with one interactions exactly in this section. Let's show that user was not present in train at all. And write a few words: why (and when) this is possible. Let's also add an example of user that was present in original train dataset but could not get recommendations (his item is rare and unknown). We have on_unsupported_targets flag in recommend method so that we don't get en error.
Links
Under the hood: Dataset processing (your "Preprocessing")
Under the hood: Transformer layers (your "Self-attention block structure")
Under the hood: ... (whatever hardcore we want to show further)

blondered · 2024-09-06T07:02:10Z

Let's add table of contents. Here's an example: https://github.com/MobileTeleSystems/RecTools/blob/experimental/sasrec/examples/8_debiased_metrics.ipynb
Links do not work in github so we don't add them now. But we still need to show the structure of the tutorial

blondered · 2024-09-06T07:05:05Z

We will also show some basic Lightning functionality to this tutorial. And add custom blocks usage. But in the next PRs.

I really liked your Preprocessing sections btw. Looks great

blondered · 2024-09-06T07:18:29Z

Links:
Turning dross into gold loss: https://arxiv.org/abs/2309.07602
gSASRec: https://arxiv.org/pdf/2308.07192

I think we should also rename SasRec to SASRec everywhere in the tutorial since it's more common. We will rename the class too at some point.

As for model description, I like gSASRec paper description from Sasha Petrov here:

"Transformer [38]-based models have recently outperformed other
models in Sequential Recommendation [17, 24–26, 29, 36]. Two
of the most popular Transformer-based recommender models are
BERT4rec [36] and SASRec [17]. The key differences between the
modelsinclude different attention mechanism (bi-directional vs. unidirectional), different training objective (Item Masking vs. Shifted
Sequence), different loss functions (Softmax loss vs. BCE loss), and,
RecSys ’23, September 18–22, 2023, Singapore, Singapore Aleksandr Petrov and Craig Macdonald
importantly, different negative sampling strategies (BERT4Rec does
not use sampling, whereas SASRec samples 1 negative per positive)"

So we can say that SASRec is a transformer-based sequential model with unidirectional attention mechanism and "Shifted Sequence" training objective.
In our implementation we don't provide negative sampling for now and use softmax loss instead.
Also you need to explain in words what is happening in the model (we are using item embeddings from user interactions sequence and feed them to multi-head self-attention to acquire user sequence latent represenation). You can rephase this but you need to explain it anyway :)

blondered · 2024-09-12T08:46:15Z

Let's fix the image and we are merging

Added sasrec tutorial

aa4bf1d

spirinamayya added 2 commits September 9, 2024 15:44

Modified SASRec tutorial

efb9c71

Changed model application

f5cb74c

Fixed image

971dcb5

blondered self-requested a review September 13, 2024 12:09

blondered merged commit afd0463 into MobileTeleSystems:experimental/sasrec Sep 13, 2024
7 checks passed

spirinamayya deleted the tutorial/sasrec2 branch October 1, 2024 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SasRec tutorial #186

SasRec tutorial #186

spirinamayya commented Sep 3, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024 •

edited

Loading

blondered commented Sep 12, 2024

SasRec tutorial #186

SasRec tutorial #186

Conversation

spirinamayya commented Sep 3, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024

blondered commented Sep 6, 2024 • edited Loading

blondered commented Sep 12, 2024

blondered commented Sep 6, 2024 •

edited

Loading