adagrad and sparseAdam work great for sparse training because there’s separate sums for each of the parameters.
Are there any other recommended optimizers for embedding training?
adagrad and sparseAdam work great for sparse training because there’s separate sums for each of the parameters.
Are there any other recommended optimizers for embedding training?