How to let ALL trainable variables in embedding matrix get zero gradients before backpropagation

lshao · August 9, 2018, 7:04pm

Hello,

I have a project code which uses ideas similar to embedding matrix in NLP. I have several extended “embedding matrices”. All variables in them are trainable and non-fixed. One extended embedding matrix uses code like:
for address, t in self.address_embeddings.items():
self.register_parameter('address_embeddings’ + address, Parameter(t))
for address, sample_embedding_layer in self._sample_embedding_layers.items():
self.add_module(‘sample_embedding_layer({})’.format(address), sample_embedding_layer)

where each sample_embedding_layer is a small FC module generated with a class from nn.Module with nn.Linear() and F.relu(). And here, “address” is equivalent to “word” in word embedding matrix.
RNN core then takes the input sequence with some “addresses” to process. However, I found that for those “addresses” which are not used by the current input sequence of RNN, the gradients are NONE instead of zero-valued. I guess the trainable variables in these extended embedding matrices are not included into model.parameters() so that optimizer.zero_grad() cannot help.
If I want to pass these trainable variables to optimizer just like
optim.SGD([{“params”: model.parameters()},
{“params”: address_embedding}, {“params”,sample_embedding_layer} or similar, do you think it will work so that I can get zero-valued gradients for untorched “addresses” in the embedding matrix?? what is the right way to do that?
Thanks a lot!