Two loss criterions simple Auto encoder

Kasi_Chennupati · September 8, 2021, 11:44am

Hi

i have created a simple Linear AE and using adam optimizer
i have two functions to minimize

loss1 = reconstruction loss
loss 2 = centroids and encoded space euclidean distance
so I created something like below pytorch version 1.9.0+cu102

loss1 = 0.8*criterion(decoded, image)
loss2 = torch.sum(torch.cdist(encoded, dist_matrix.to(device))**2)

loss = loss1 + loss2
loss.backward(retain_graph=True)
optimizer.step()
optimizer.zero_grad()

Auto encoder code

class LAE3d(nn.Module):
    def __init__(self):
        # N, 784(28*28)
        super(LAE3d, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(1*28 * 28, 1000),
            nn.ReLU(),
            nn.Linear(1000, 250),
            nn.ReLU(),
            nn.Linear(250, 50),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(50, 250),
            nn.ReLU(),
            nn.Linear(250, 1000),
            nn.ReLU(),
            nn.Linear(1000, 1*28*28),
            nn.ReLU()
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded


model = LAE3d()
model.to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

print(summary(model, input_size=(1, 1*28*28)))

basically trying to recreate Paper

I get the following error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [250, 50]], which is output 0 of TBackward, is at version 305; expected version 304 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

ptrblck · September 9, 2021, 5:33am

The retain_graph=True option might be the root cause for this error and is often not needed.
Could you explain why you are using it and if your model would work without this argument?

Kasi_Chennupati · September 9, 2021, 9:36am

Hi

No, this gives the following error.

i use the retain graph =true because to avoid freeing up the grad applied to the encoder weights and decoder weights… so to avoid the mismatch of weight data when the second loss is applied

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

ptrblck · September 9, 2021, 9:46am

I don’t quite understand the use case, since you are adding both losses in your initial post and call loss.backward() on the accumulated loss.

Kasi_Chennupati · September 9, 2021, 4:18pm

use case:
i want to cluster the encoded data into n clusters while its training so loss1 is the reconstruction error and loss 2 is euclidean distance between the centroids and encoded data.

i want to minimize both the losses so I can improve clustering efficiency.

so I did loss1+loss2 and loss.backward()

ptrblck · September 9, 2021, 5:49pm

That sounds valid and I don’t know why retain_graph=True would be needed.
Based on your description you are executing forward passes through the models, calculate losses, accumulate them, and call backward on the final loss.
I don’t see why the graph should be kept alive and where the “backward through the graph a second time” issue would be raised.