Hi, I am attempting to train a Siamese neural network that defines a particular embedding function f(x), while performing optimization simultaneously with a clustering model (Gaussian mixture model) on that embedding space.

I want the NN weights to be updated with respect to both a loss function that measures the quality of the embedding, but also the Gaussian mixture model variational loss (ELBO).

However, when I call loss.backward(), I get the following error:

`RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.`

Here is my code. I’ve omitted the GMM code and a few other functions are they are about 1000 lines and don’t seem particularly helpful to include.

```
class Siamese(nn.Module):
def __init__(self, input_size):
super(Siamese, self).__init__()
self.hidden = nn.Linear(input_size, 2)
self.out = nn.Linear(2, 2)
def forward_one(self, x):
x = self.hidden(x)
x = self.out(x)
return x
def forward(self, x1, x2):
out1 = self.forward_one(x1)
out2 = self.forward_one(x2)
dis = torch.norm(out1 - out2, dim=1)
return dis
data, labels = load_dataset()
net = Siamese()
net_optim = torch.optim.Adam(net.parameters(), lr=0.05, weight_decay=1)
# initialize weights, means, and covariances for the Gaussian clusters
concentrations, means, covariances, precisions = initialization(net.forward_one(data))
for i in range(1000):
net_optim.zero_grad()
pairs, pair_labels = pairGenerator(data, labels) # samples some pairs of datapoints
outputs = net(pairs[:, 0, :], pairs[:, 1, :]) # computes pairwise distances
embedding = net.forward_one(data) # embeds all data in the NN space
log_prob, log_likelihoods = expectation_step(embedding, means, precisions, concentrations)
concentrations, means, covariances, precisions = maximization_step(embedding, log_likelihoods)
loss = FullLoss(outputs, pair_labels, log_likelihoods, log_prob, precisions, concentrations)
loss.backward()
net_optim.step()
```

FullLoss is a loss function computed based on pairwise distances in the NN embedding space, in addition to the Gaussian mixture model loss (based on likelihoods relative to the current settings of the weights, means, and covariances).

Can anyone tell me why my .backward() call requires `retain_graph=True`

? This is impractical for my use case as by the 400th iteration or so, each iteration is taking 30 minutes.

Thanks!