Hi.

I am implementing a Gaussian observation model for a multioutput task. The class I have implemented has the following init method. Basically the observation noise can be shared by the different outputs.

```
class GaussianLinearMean(nn.Module):
def __init__(self,out_dim : int, noise_init: float, noise_is_shared : bool):
super(GaussianLinearMean,self).__init__()
self.out_dim = out_dim
self.noise_is_shared = noise_is_shared
if noise_is_shared: # if noise is shared create one parameter and expand to out_dim shape
log_var_noise = nn.Parameter(torch.ones(1,1,dtype = cg.dtype)*torch.log(torch.tensor(noise_init,dtype = cg.dtype)))
log_var_noise = log_var_noise.expand(out_dim,1)
else: # creates a vector of noise variance parameters.
log_var_noise = nn.Parameter(torch.ones(out_dim,1,dtype = cg.dtype)*torch.log(torch.tensor(noise_init,dtype = cg.dtype)))
self.log_var_noise = log_var_noise
```

For the case in which the observation noise is shared, I just create one parameter that is expanded to match the number of outputs. If I use the expand method in `log_var_noise = log_var_noise.expand(out_dim,1)`

everything works fine. However if I use repeat, i.e `log_var_noise = log_var_noise.repeat(out_dim,1)`

I get the following error when backward the loss:

```
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
```

in the second iteration of the algorithm. As far as I understood the difference between repeat or expand is whether the memory is copied or not. I guess that the computed gradient is equivalent for both methods. I am just wondering where might be the error. This problem is solved if the `repeat`

method is called at each call to the forward, rather than done in the `__init__`

method. However, I prefer to do it only once for speed.