# Models

# Candidate Generation

# Collaborative Filtering

User Based

in the below image, 0 (user didn't watch) , -1 (user hated the movie) user_user_cf_ratings

Users might rate the movie differently. So, we subtract rating from average, user_user_cf_mean_ratings

how user D , might like video 3 user_user_cf_sample_calc

Find similar users to the active users based on intersection of watches. Pros: Greate Diversity in recommendation Con: Changing user behaviours

Item Based

item_item_cf_example

Find similar items to the active item based on intersection of watches Pros: Item similarity can be computed once; taye of increase in items < users Pros: items are easier to categorize Con: lack of diversity

Two types: Nearest Neighbor , Matrix Factorization

# Collaborative Filtering: Nearest Neighbor

collab_filtering_1

collab_filtering_2

# Collaborative Filtering: Matrix Factorization

collab_filtering_matrix_1 collab_filtering_matrix_2

  • Initialize user and movie vectors randomly
  • Compute feedback for known user-movie feedback value
  • Difference between actual (f_ij) and predicted feedback u_i*m_j, will be error
  • Use stochastic gradient descent to update user and movie latent factors

# Other Notes

Time deacy: as time goes on , put less weight to a rating. Apply more weigh to less frequented items.

# Content Based Filtering

Use characteristics from movie metadata

Two approaches:

a) Recommend movies similar to those user have interacted in the past content_filtering_approach_a

b) Build user profile from the average of media profiles; then cosine similarity content_filtering_approach_b_1 content_filtering_approach_b_2

# Embeddings using Deep Learning

You set up the network as two towers : Tower 1: media only sparse and dense features Tower 2: user-only sparse and dense features

The activation of the first tower’s last layer will form the media’s vector embedding (m).

The activation of the second tower’s last layer will form the user’s vector embedding (u).

The combined optimization function at the top aims to minimize the distance between the dot product of u and m (predicted feedback) and the actual feedback label.

two_tower

# Ranking

For actual predicting rating, you can use the below type of network.

ranking_nn_model_1

ranking_nn_model_2

# Model Comparisions

Collaborative filtering pro: does not require domain knowledge to create user and media profiles Con: cold start for user and media Con: Echo Chamber Con: Shilling Attack

Content Based Filtering Pro: cold start user can provide recommendation Pro: new media profile can be build immediately