Changing weight after forward and before backward

papyan · December 14, 2017, 4:56am

Is it ok to change weights after the forward pass and before the backward pass?

SimonW · December 14, 2017, 7:31am

Could you be more specific on how you change weights?

papyan · December 14, 2017, 8:08am

Yes. I set the weights to some constant pre-defined matrix (for instance a matrix of ones or a matrix with random entries).

SimonW · December 14, 2017, 8:10am

Did you do it as layer.weight = xxx or something like layer.weight.data.copy_(xxx)?

papyan · December 14, 2017, 8:23am

layer.weight.data = randn(layer.weight.data.size())

papyan · December 14, 2017, 11:57pm

It seems if I use what you suggested,
layer.weight.data.copy_(ones(layer.weight.data.size()))
then it works but if I use
layer.weight.data.copy_(sign(randn(layer.weight.data.size())))
then the backprop returns NaN.

SimonW · December 15, 2017, 12:37am

Never change the .data attribute of a Variable. It breaks the invariant.

SimonW · December 15, 2017, 12:38am

Yeah, it will break because we shouldn’t do that. One should never directly operate with .data of Variable.

layer.weight = xxx will be fine.

papyan · January 3, 2018, 5:34am

My goal is to change the weights after the forward pass but before the backward.
If I use layer.weight = xxx then the backward pass ignores the new weights and uses the old ones. It seems that directly writing to layer.weight.data, which you advice against, is the only option that works.

jpeg729 · January 3, 2018, 8:08am

The problem is that the weights can be used when calculating their own gradients, so if you modify the weights before the backward pass then the calculated gradients could be wrong, but maybe that is what you want.

Only by examining the calculation that the weights are involved in, and their size during the forward pass, could we hope to understand why you are getting NaNs. If you are trying to implement a paper, it may help us to know which one.

SimonW · January 3, 2018, 3:30pm

See @jpeg729 's reply below. Also, the gradient won’t be valid if you change it like that. If you want to apply the gradient with one set of params on another set, just copy .grad after backward.