Autograd memory usage for nn.Embedding

CvW · September 18, 2018, 10:23am

I’m creating a model where I try to dynamically add embeddings. I’ve now resorted to creating a fixed size embedding tensor from the start and just copying the new ones in. This large embedding matrix logically requires more memory to store, but by itself that is not a problem.

Where I do run into some problems is when I run my backwards pass of the model. I assumed that the actual size of the embedding matrix would not matter here, as only a set number of indices is selected and changed.
However, I noticed that during my backwards pass my memory usage suddenly increased by a factor 2 just to accommodate the embedding matrix.

My question is thus as follows:
Does the backward pass on an embedding matrix update all the rows separately or only the ones that are actually indexed? If the first, would it not be more effective to create a temporary embedding matrix by just copying the required values out of embedding matrix into a smaller tensor; then using this for the forward and backward passes and reinsert it into the embedding matrix?
It seems a big waste of memory otherwise?
Thanks!

tom · September 18, 2018, 11:13am

Yes, that can be a big waste of memory.
For this reason, there is the sparse option to nn.Embedding to return sparse gradients and some notes on how to use that.

Best regards

Thomas