Hi,

I am trying to implement a pseudo-likelihood parameter estimation method for a Potts model on images. I am using PyTorch because it seems convenient to leverage it’s GPU / auto-differentiation capabilities. However, I am not sure how I should implement some specific details of my model. I have 2 questions :

**Question 1:**

This model is overparameterized (4 different interaction parameters per neighbourhood: (top, bottom, right, left), a bit like a 2D convolution with a different kernel for every patch of the image), and I am not sure how my weight tensor should be set up so that backpropagation works properly. For now, because I don’t want diagonal parameters in my model, my weight initialization code looks like this:

```
param_size = self.unfold(torch.ones(img_size).view((1, 1, *img_size))).size()
nb_patchs = param_size[-1]
param_array = torch.zeros(param_size).float()
self.param_vector = torch.ones((4, nb_patchs), requires_grad = True).float()
param_array[:,1,:] = self.param_vector[0,:]
param_array[:,3,:] = self.param_vector[1,:]
param_array[:,5,:] = self.param_vector[2,:]
param_array[:,7,:] = self.param_vector[3,:]
self.params = param_array
```

I create a `param_array`

tensor that has the same shape as `im2col(image)`

, and I set some of it’s values in-place with a grad-enabled tensor. I have `self.params.grad`

equal to `None`

, with `grad_fn=CopySlices`

, while `self.param_vector.grad`

is not `None`

.

This suggests to me that Autograd can differentiate through the in-place setting just fine, and this code will make sure that the non-diagonal elements in every neighbourhood (2 dimension of self.params) *will not* be updated.

However, the autograd documentation (Autograd mechanics — PyTorch 1.8.1 documentation) suggests that doing in-place operation is a bad idea, because it doesn’t free up any memory. I am not concerned about memory, but does that still mean that I should change my implementation ? If yes, how should I “blank out” some elements of the parameter tensor so that they aren’t updated ? My current understanding is that this can be done at a Tensor level (with the `requires_grad`

flag), but is there a way to do it for a specific element of a tensor ?

**Question 2:**

My model is implemented as a `torch.nn.Module`

subclass. Should my parameters be an attribute of the class ? I have 2 other functions that compute some elements of the total log-likelihood of the batch, and a forward function that returns a scalar (log-likelihood of the batch given current parameters). Having the model parameters as an attribute of the class is a convenient way to pass parameters. Is that actually how it should be done ? I think this is causing some `Trying to backward through the graph a second time, but the saved intermediate results have already been freed`

problems for me, as if the likelihood of a batch was depending on the computation done in the previous batch.

Sorry for the long post, and thanks in advance to anybody that can help me !