I am trying to implement a pseudo-likelihood parameter estimation method for a Potts model on images. I am using PyTorch because it seems convenient to leverage it’s GPU / auto-differentiation capabilities. However, I am not sure how I should implement some specific details of my model. I have 2 questions :
This model is overparameterized (4 different interaction parameters per neighbourhood: (top, bottom, right, left), a bit like a 2D convolution with a different kernel for every patch of the image), and I am not sure how my weight tensor should be set up so that backpropagation works properly. For now, because I don’t want diagonal parameters in my model, my weight initialization code looks like this:
param_size = self.unfold(torch.ones(img_size).view((1, 1, *img_size))).size() nb_patchs = param_size[-1] param_array = torch.zeros(param_size).float() self.param_vector = torch.ones((4, nb_patchs), requires_grad = True).float() param_array[:,1,:] = self.param_vector[0,:] param_array[:,3,:] = self.param_vector[1,:] param_array[:,5,:] = self.param_vector[2,:] param_array[:,7,:] = self.param_vector[3,:] self.params = param_array
I create a
param_array tensor that has the same shape as
im2col(image), and I set some of it’s values in-place with a grad-enabled tensor. I have
self.params.grad equal to
self.param_vector.grad is not
This suggests to me that Autograd can differentiate through the in-place setting just fine, and this code will make sure that the non-diagonal elements in every neighbourhood (2 dimension of self.params) will not be updated.
However, the autograd documentation (Autograd mechanics — PyTorch 1.8.1 documentation) suggests that doing in-place operation is a bad idea, because it doesn’t free up any memory. I am not concerned about memory, but does that still mean that I should change my implementation ? If yes, how should I “blank out” some elements of the parameter tensor so that they aren’t updated ? My current understanding is that this can be done at a Tensor level (with the
requires_grad flag), but is there a way to do it for a specific element of a tensor ?
My model is implemented as a
torch.nn.Module subclass. Should my parameters be an attribute of the class ? I have 2 other functions that compute some elements of the total log-likelihood of the batch, and a forward function that returns a scalar (log-likelihood of the batch given current parameters). Having the model parameters as an attribute of the class is a convenient way to pass parameters. Is that actually how it should be done ? I think this is causing some
Trying to backward through the graph a second time, but the saved intermediate results have already been freed problems for me, as if the likelihood of a batch was depending on the computation done in the previous batch.
Sorry for the long post, and thanks in advance to anybody that can help me !