I’m creating a model where I try to dynamically add embeddings. I’ve now resorted to creating a fixed size embedding tensor from the start and just copying the new ones in. This large embedding matrix logically requires more memory to store, but by itself that is not a problem.

Where I do run into some problems is when I run my backwards pass of the model. I assumed that the actual size of the embedding matrix would not matter here, as only a set number of indices is selected and changed.

However, I noticed that during my backwards pass my memory usage suddenly increased by a factor 2 just to accommodate the embedding matrix.

My question is thus as follows:

Does the backward pass on an embedding matrix update **all** the rows separately or only the ones that are actually indexed? If the first, would it not be more effective to create a temporary embedding matrix by just copying the required values out of embedding matrix into a smaller tensor; then using this for the forward and backward passes and reinsert it into the embedding matrix?

It seems a big waste of memory otherwise?

Thanks!