I have a custom layer that has a Tensor which should never be modified. It is used together with the input to produce a result and so it should allow gradients to pass backward. Again, it should never be changed.
My original solution was to make this a class member. This was working fine until I needed to switch to a multi-GPU setup. Unless it was a nn.Parameter
type, it would not be set to the correct GPU device when model.cuda()
is called.
I tried setting requires_grad=False
, but the network fails to train:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I suspect that I am misunderstanding something fundamental about the Parameter
class.
Another attempt that I tried was to override torch.nn.Module.cuda()
in my subclass. This would allow me to easily assign my layer’s data to the correct device. It did not work.
Summary
I have a custom model with a class member used in forward/backward calls.
The class member should NEVER be modified.
I set the class member as nn.Parameter
so that it would be assigned to the correct device when model.cuda()
is called.
It changes during backward propagation when requires_grad=True
.
Training crashes when requires_grad=False
.
Is there a way to create a Tensor
that will be copied to the correct GPU when model.cuda()
is called?