I would like to constrain my optimisation on a subset of the training params, also imposing a relation among them, in order to reduce the number of DoF on my transform.
I have searched for “alias”, “parameter sharing” but nothing seems to fit.
For example, using STN components I would like to learn just 3 variables (e.g. 2 offsets and 1 scale), instead of the full 6 parameters. I thought that it could be done building the parameter matrix by hand.
In order to do so in my model init:
def __init__: # [...] self.mytensor = torch.rand(N, C, H, W) #learn this self.offset = torch.randn(N, 2).double().cuda() # learn this self.scale = torch.ones(1).double().cuda() # learn this self.mytensor.requires_grad = True self.scale.requires_grad = True self.offset.requires_grad = True # [...]
while in my model forward I compose the temporary tensor reading the actual learned values
def forward(self, index): with torch.no_grad(): # temporary affine transform matrix cm = torch.zeros(1,2,3, dtype=torch.double, device=self.mytensor.device, requires_grad=True) cm[0,0,0], cm[0,1,1] = self.magfactor, self.magfactor cm[0,:,2] = self.offval[index] cm.requires_grad = True # affine transform matrix is ready for actual forward pass grid = torch.nn.functional.affine_grid(cM, self.mytensor.size()) y = torch.nn.functional.grid_sample(self.mytensor, grid) # some operations on y retval = y**2 return retval
In my main script
model = myModel() modelparams = [model.mytensor, model.scale, model.offset] optim = torch.optim.Adam(modelparams_par, amsgrad=False) loss = #... [...] loss.backward() print(model.offval.grad, model.magfactor.grad) # print None
But the gradients on those two parameters are None. If instead of building a temporary tensor, I use a complete theta tensor, (as in a plain STN), everything works fine.
I thought that was a legit alias but it seems not.
So how to optimise just model.offval and model.magfactor? Thank you in advance