I’m sort of a beginner in Pytorch and I’m trying to implement a implicit matrix factorization model which uses Alternating Least Squares (ALS). I’m trying to see if it’s even possible to implement ALS as an optimizer or loss function to update gradients because I’m not seeing a way to “fix” one embedding’s gradient temporarily in order to calculate the gradient of another.
Any thoughts or inputs is much appreciated.
As you have two separate (sets of) parameters to train: What you’d typically do is to define two optimizers (for users and items) and set the other’s
requires_grad to false when computing an update for either.
The next question could be whether you get significantly worse results when computing simultaneous updates instead of alternating ones, I must admit I’d probably try that, too.
Thanks a lot Tom for getting me started on this! The reason why I’m even considering ALS is because SGD with a small batch size is very slow using a single GPU, which makes sense because I’m not maximizing the GPU utilization by using a smaller batch size. However, with a larger batch, although SGD becomes blazing fast, my model converges to worse local minima, since a larger batch size would indicate less noise in the data. Noise is what we want when escaping local minimas.
Do you think I should continue pursuing ALS?