How can I add a variable in a network definition that gets tuned during training?

Shisho_Sama · December 28, 2018, 6:17am

Is it possible to have a variable inside the network definition that is trainable and gets trained during training?
to give a very simplistic example, suppose I want to specify the momentum for batch-normalization or the epsilon to be trained in the network. Can I simply do :

self.batch_mom1 = torch.tensor(0, dtype=torch.float32, device='cuda:0', requires_grad=True) 
self.batch_mom2 = torch.tensor(0, dtype=torch.float32, device='cuda:0', requires_grad=True) 
self.batch_mom3 = torch.tensor(0, dtype=torch.float32, device='cuda:0', requires_grad=True)

model = nn.Sequential(
              nn.Conv2d(3, 66, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
              nn.BatchNorm2d(66, eps=1e-05, momentum=self.batch_mom1.item(), affine=True),
              nn.ReLU(inplace=True),

              nn.Conv2d(66, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
              nn.BatchNorm2d(128, eps=1e-05, momentum=self.batch_mom2.item(), affine=True),
              nn.ReLU(inplace=True),

              nn.Conv2d(128, 192, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
              nn.BatchNorm2d(192, eps=1e-05, momentum=self.batch_mom3.item(), affine=True),
              nn.ReLU(inplace=True)
...

inside my graph and expect the variable to be tuned? since it is set as requires_grad=True!
if not, what is the correct way of doing such things? should I create a whole new layer for that?

smth · December 29, 2018, 1:01am

No, you cant have it inside your graph and expect it to be tuned.

If you followed the first PyTorch tutorials at https://pytorch.org/tutorials/ , you must’ve learned the concept of an optimizer, and how it does the optimization step (which will tune the weights). You have to give these variables to an optimizer, to tune them.

The default model parameters are given to it with model.parameters() call. I suggest you do the tutorial again to get a better understanding.

Shisho_Sama · December 29, 2018, 10:04am

I had the impression, knowing the dynamic nature of graphs in pytorch, adding a variable to the graph would automatically include it in the parameter list and thus in the optimization process!
It seems my understanding is flawed here.

smth · December 29, 2018, 3:38pm

to help you take the two apart:

adding a variable to the graph would automatically compute it’s gradients.
giving the variable to the optimizer would invoke the update rule (for SGD it is: x = x - learning_rate * x.grad), which tunes the variable

Shisho_Sama · December 30, 2018, 5:47pm

Thanks a lot I really appreciate it, however, I need to write a very simple example for myself so that I can fully get how everything works.
There is an example in the docs that shows how it is done in a manual fashion. that is writing the optimization steps manually (calculating the gradients and then updating the variables respectively).
However, I should still be able to set some parameters in my model right? so that when I send my model.parameters() to an optimizer, it gets optimized accordingly.
So basically I should be able create or define a new Parameter in my module first and then add it to my module, by registering it as a module attribute, something like this :

myvar = torch.tensor(0, dtype=torch.float32, requires_grad=True)
myparam = torch.nn.Parameter(myvar) 
mymodel.register_parameter( param_name , myparam )

I assume, this should add my new parameters to the module list of parameters. is this assumption correct? since according to the documentation for Parameter:

A kind of Tensor that is to be considered a module parameter.

Parameters are Tensor subclasses, that have a very special property when used with Modules
when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model.
If there was no such class as Parameter, these temporaries would get registered too.

Update :
For the answer click here!