Model Parameter that is not updated

ajdillhoff · November 13, 2018, 3:12pm

I have a custom layer that has a Tensor which should never be modified. It is used together with the input to produce a result and so it should allow gradients to pass backward. Again, it should never be changed.

My original solution was to make this a class member. This was working fine until I needed to switch to a multi-GPU setup. Unless it was a nn.Parameter type, it would not be set to the correct GPU device when model.cuda() is called.

I tried setting requires_grad=False, but the network fails to train:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I suspect that I am misunderstanding something fundamental about the Parameter class.

Another attempt that I tried was to override torch.nn.Module.cuda() in my subclass. This would allow me to easily assign my layer’s data to the correct device. It did not work.

Summary

I have a custom model with a class member used in forward/backward calls.
The class member should NEVER be modified.
I set the class member as nn.Parameter so that it would be assigned to the correct device when model.cuda() is called.
It changes during backward propagation when requires_grad=True.
Training crashes when requires_grad=False.

Is there a way to create a Tensor that will be copied to the correct GPU when model.cuda() is called?

ajdillhoff · November 13, 2018, 7:07pm

Unless there is a better way to fix this, I implemented a hacky fix by copying the class member data that I need every time the forward pass is called. This guarantees that the class data will match the device and type of the input.

InnovArul · November 13, 2018, 10:29pm

You may need register_buffer().