I am using a simple NN with embedding layer. My embedding size is quite large (vocab x dim ~130,000 x 300).
Loss backprop step is too slow and it seems like it always takes the same amount of time (~36 sec) no matter what batch size I use (5 or 2000).
Does it mean that autograd computes gradients for all embedding vectors without considering if they were used in the batch or not?
Is there any possible workaround to solve the issue?