Expand vs repeat used within backward raise Error

Hi.

I am implementing a Gaussian observation model for a multioutput task. The class I have implemented has the following init method. Basically the observation noise can be shared by the different outputs.

class GaussianLinearMean(nn.Module):

    def __init__(self,out_dim : int, noise_init: float, noise_is_shared : bool):
        super(GaussianLinearMean,self).__init__()

        self.out_dim = out_dim
        self.noise_is_shared = noise_is_shared

        if noise_is_shared: # if noise is shared create one parameter and expand to out_dim shape
            log_var_noise = nn.Parameter(torch.ones(1,1,dtype = cg.dtype)*torch.log(torch.tensor(noise_init,dtype = cg.dtype)))
            log_var_noise = log_var_noise.expand(out_dim,1)

        else: # creates a vector of noise variance parameters.
            log_var_noise = nn.Parameter(torch.ones(out_dim,1,dtype = cg.dtype)*torch.log(torch.tensor(noise_init,dtype = cg.dtype)))

        self.log_var_noise = log_var_noise

For the case in which the observation noise is shared, I just create one parameter that is expanded to match the number of outputs. If I use the expand method in log_var_noise = log_var_noise.expand(out_dim,1) everything works fine. However if I use repeat, i.e log_var_noise = log_var_noise.repeat(out_dim,1) I get the following error when backward the loss:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

in the second iteration of the algorithm. As far as I understood the difference between repeat or expand is whether the memory is copied or not. I guess that the computed gradient is equivalent for both methods. I am just wondering where might be the error. This problem is solved if the repeat method is called at each call to the forward, rather than done in the __init__ method. However, I prefer to do it only once for speed.

Hi,

The difference is that the current formula for repeat needs to access the original input (and so saves it) while expand does not.
So, if you share these ops across multiple backward, the expand version does not have any issue (since nothing was saved, nothing was cleared by the first backward) but the repeat version does not have the saved Tensors anymore and lead to this error.

If you want to use repeat, you will have to do it at each forward I’m afraid.

The difference of behavior though could be removed (and repeat made a bit faster), opened an issue to track that: https://github.com/pytorch/pytorch/issues/40701

1 Like

Thanks for your reply. I have dig a bit into and It seems that both methods have to be called withing the forward. If not, the gradient has a value of None for the expand case.

Right this is a different question:
The short answer is that the .grad field is only populated for leaf Tensors (Tensors with no history). And the result of the repeat/expand is not a leaf. So what you see is expected.
You will need to have 2 Tensors, one that is the original Parameter with a single element that is a leaf. And the expanded/repeated one that is used in the forward (that won’t get gradient).

Or you can move the repeat/expand to the forward (the safest bet as it does not rely on internal implementation details of expand/repeat).