I’m implementing a very simple recommender system in PyTorch, using the MovieLens dataset (the one with +20M ratings). Here is the github address to the notebook I’m creating: https://github.com/echo66/pytorch-rec-sys/blob/master/step-1.ipynb
I will summarize what I’m doing:
- Each item and user will be described by a vector of latent features. So, all the parameters are within two matrices: the users and the items features matrices. Each matrix has F columns, being F the number of latent features.
- The prediction is generated by the inner product of user and item vectors.
- The loss function used is the the Mean Squared Error.
- I’m using two L2 regularizers, one for the user features and another for the item features.
- I sorted the dataset by timestamp.
- Due to the lack of good support for general sparse tensors (AFAIK) in PyTorch, for each batch, I’m slicing the matrices. Check this function.
- Batch Size: 2000
- Number of Batches: 8542
- Number of users: 259.137
- Number of items: 165.201
- Number of ratings: 24.404.096
- Training dataset size: 17.082.867
- Test dataset size: 7.321.228
Right now, each backward pass is taking between 7-10 seconds, while the forward passes are taking less than 0.5 seconds. Am I doing anything considered naive?
Thanks in advance!