Is it possible to have a variable inside the network definition that is trainable and gets trained during training?
to give a very simplistic example, suppose I want to specify the momentum for batch-normalization or the epsilon to be trained in the network. Can I simply do :
self.batch_mom1 = torch.tensor(0, dtype=torch.float32, device='cuda:0', requires_grad=True)
self.batch_mom2 = torch.tensor(0, dtype=torch.float32, device='cuda:0', requires_grad=True)
self.batch_mom3 = torch.tensor(0, dtype=torch.float32, device='cuda:0', requires_grad=True)
model = nn.Sequential(
nn.Conv2d(3, 66, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
nn.BatchNorm2d(66, eps=1e-05, momentum=self.batch_mom1.item(), affine=True),
nn.ReLU(inplace=True),
nn.Conv2d(66, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
nn.BatchNorm2d(128, eps=1e-05, momentum=self.batch_mom2.item(), affine=True),
nn.ReLU(inplace=True),
nn.Conv2d(128, 192, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
nn.BatchNorm2d(192, eps=1e-05, momentum=self.batch_mom3.item(), affine=True),
nn.ReLU(inplace=True)
...
inside my graph and expect the variable to be tuned? since it is set as requires_grad=True
!
if not, what is the correct way of doing such things? should I create a whole new layer for that?