I am using many zero-paddings in my batch data. The padding index is 0. So I want my model to ignore the padding effect of my trainings. So could someone check my code is correct or not?
optimizer.zero_grad()
loss.backward()
for name, param in model.named_parameters():
if param.grad is not None:
if 'A.' in name or 'W.weight' in name:
param.grad.data[0] = 0
optimizer.step()
If you don’t have a very large number of embedding layers, you could address the parameters directly, e.g. “model.emb.weight” if the embedding is named emb.
Also, I would probably follow Soumith’s suggestion to reset the padding embedding vector after the step instead of changing the grad before. Very likely both work, but if you are set to keep them fixed, resetting feels slightly more straightforward than setting the grad to zero in order to change the update to zero.
Thank you for your nice information. Oh then, after update parameters, just reset the embedding weight of specific index. Is it the same as yours? It looks much simpler.