Hi,
I am trying to implement a pseudo-likelihood parameter estimation method for a Potts model on images. I am using PyTorch because it seems convenient to leverage it’s GPU / auto-differentiation capabilities. However, I am not sure how I should implement some specific details of my model. I have 2 questions :
Question 1:
This model is overparameterized (4 different interaction parameters per neighbourhood: (top, bottom, right, left), a bit like a 2D convolution with a different kernel for every patch of the image), and I am not sure how my weight tensor should be set up so that backpropagation works properly. For now, because I don’t want diagonal parameters in my model, my weight initialization code looks like this:
param_size = self.unfold(torch.ones(img_size).view((1, 1, *img_size))).size()
nb_patchs = param_size[-1]
param_array = torch.zeros(param_size).float()
self.param_vector = torch.ones((4, nb_patchs), requires_grad = True).float()
param_array[:,1,:] = self.param_vector[0,:]
param_array[:,3,:] = self.param_vector[1,:]
param_array[:,5,:] = self.param_vector[2,:]
param_array[:,7,:] = self.param_vector[3,:]
self.params = param_array
I create a param_array
tensor that has the same shape as im2col(image)
, and I set some of it’s values in-place with a grad-enabled tensor. I have self.params.grad
equal to None
, with grad_fn=CopySlices
, while self.param_vector.grad
is not None
.
This suggests to me that Autograd can differentiate through the in-place setting just fine, and this code will make sure that the non-diagonal elements in every neighbourhood (2 dimension of self.params) will not be updated.
However, the autograd documentation (Autograd mechanics — PyTorch 1.8.1 documentation) suggests that doing in-place operation is a bad idea, because it doesn’t free up any memory. I am not concerned about memory, but does that still mean that I should change my implementation ? If yes, how should I “blank out” some elements of the parameter tensor so that they aren’t updated ? My current understanding is that this can be done at a Tensor level (with the requires_grad
flag), but is there a way to do it for a specific element of a tensor ?
Question 2:
My model is implemented as a torch.nn.Module
subclass. Should my parameters be an attribute of the class ? I have 2 other functions that compute some elements of the total log-likelihood of the batch, and a forward function that returns a scalar (log-likelihood of the batch given current parameters). Having the model parameters as an attribute of the class is a convenient way to pass parameters. Is that actually how it should be done ? I think this is causing some Trying to backward through the graph a second time, but the saved intermediate results have already been freed
problems for me, as if the likelihood of a batch was depending on the computation done in the previous batch.
Sorry for the long post, and thanks in advance to anybody that can help me !