Hey, I am currently experiencing an issue with my regularization loss (which is calculated and back propagated through without first being combined with a normal loss) giving me the error ‘trying to backward through the graph a second…’.

I modify the weights using the outputs of another layer and that does not allow me to specify `retain_graph=False`

in `torch.autograd.grad()`

.

Here is a simple code example which fails:

```
w = nn.Parameter(torch.zeros(10, 10))
dense = nn.Linear(10, 10)
x = torch.rand(1, 10).requires_grad_(True)
y = x.matmul(w.t() * dense(x).mean(0)).mean()
grad = torch.autograd.grad(
outputs=y,
inputs=x,
retain_graph=False,
create_graph=True,
only_inputs=True
)[0]
grad.mean().backward()
```

And here is a version that works but without modifying the weight.

```
w = nn.Parameter(torch.zeros(10, 10))
x = torch.rand(1, 10).requires_grad_(True)
y = x.matmul(w.t()).mean()
grad = torch.autograd.grad(
outputs=y,
inputs=x,
retain_graph=False,
create_graph=True,
only_inputs=True
)[0]
grad.mean().backward()
```

Is there any way for me to run this without using `retain_graph=True`

? Or is there no difference in memory/performance from retaining the graph for my specific example?