Skip to content

Week 11 and 12

Alvaro Lopes edited this page Oct 9, 2023 · 3 revisions

TL;DR

During weeks 11 and 12, I implemented the following evaluation metrics:

  • MAP@k (Mean Average Precision)
  • nDCG@k (Normalized Discounted Cumulative Gain)

Implementing a evaluation metric

  • Create a Metric subclass
  • For your subclass, override the following methods:
    • name(): returns the model name as a string.
    • eval(ratings, recommendations): evaluate recommendation metric over user training/test ratings.
  • Add your subclass to evaluator/metric2class.py, adding a dictionary key with the metric name (to be used in the config file), its submodule and class name.

In the .yaml file, the evaluation directive defines the experiment evaluation. The evaluation metadata is given in the format metadata1: metadata1_value. Example:

experiment:
  # ...
  evaluation:
    k: 5
    relevance_threshold: 3
    metrics: [MAP, nDCG]

Where,

  • evaluation: specifies the evaluation metadata (mandatory)
    • k: evaluates the first k recommendations (mandatory)
    • relevance_threshold: threshold value to consider a rating relevant.
    • metrics: list of metric names to be evaluated. (mandatory)

Metrics

The goal of this framework is to implement the main used evaluation metrics as demonstrated by DaisyRec literature review.

Some of the main metrics are Precision, Recall, Mean Average Precision (MAP), Hit Ratio (HR), Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (nDCG).

For now, the implemented metrics were MAP and nDCG.

MAP@k

The Average Precision $AP@N$ for all $m$ relevant items is given by:

$$ \begin{align*} \textrm{AP@N} = \frac{1}{min(m,N)}\sum_{k=1}^N \textrm{($P(k)$ if $k^{th}$ item was relevant)} = \frac{1}{min(m,N)}\sum_{k=1}^N P(k)\cdot rel(k) \end{align*} $$

Where precision $P$ is given by:

$$P(k) = \frac{\text{n of relevant recommendations}}{\text{n of recommended items}} = \frac{\text{n of relevant recommendations}}{k}$$

Finally, the $\text{MAP@N}$ for a set of users $U$ is given by:

$$\begin{align*} \textrm{MAP@N} = \frac{1}{|U|}\sum_{u}^{U}(\textrm{AP@N})_u \end{align*}$$

nDCG@k

The Discounted Cumulative Gain $DCG$ is given by:

$$DCG@k = \sum_{i=1}^{k}\frac{rel_i}{\log_2(i+1)}$$

The normalization is done by the Ideal Discounted Cumulative Gain at k:

$$nDCG@k = \dfrac{\text{DCG@k}}{\text{IDCG@k}}$$

The final overall metric for all users is given by:

$$\frac{1}{|U|}\sum_{u}^{U}(\textrm{nDCG@k})_u$$

References

Zhu Sun et al., "DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation"

Sonya Sawtelle, "Mean Average Precision (MAP) For Recommender Systems"

Wikipedia Article, "Discounted Cumulative Gain"

Clone this wiki locally