Problem backwarding a second time

Simone256 · March 30, 2021, 9:12pm

I’m facing an issue, which is probably stupid, that I can’t solve.

I would like to train a generator of images. My architecture is made by generator that maps some noise in a 28 x 28 tensor, and after this I compute an estimator of the maximum mean discrepancy (using a Gaussian kernel) between the generated images and my imported images (MNIST). This discrepancy will be my loss, that I want to back-propagate with respect to the parameters of my generator. Not sure if this is gonna work, but it should .

This is the main part of my optimization:

for epoch in range(epochs):

    noise = torch.randn((noise_batch_size, latent_size))
    generated_images = net(noise)

    for input_images, _ in iter(train_loader):
        input_images = input_images.to(device)
        optimizer.zero_grad()
        loss = mmd(input_images.squeeze(), generated_images.squeeze())
        loss.backward()
        optimizer.step()

Sadly, It returns a RuntimeError when it backpropagates the second time:

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

At this point I try to “correct” my code with loss.backward(retain_graph=True), but I get another RuntimeError always at the second backpropagation:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 10, 3, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Unfortunately, I wasn’t able to fix this (I tried setting some inplace=False or random suggestions I found in old topics, but I guess I’m missing what is the actual problem here…)

I ask again sorry, this will definitely be a newbie error.

sio277 · March 31, 2021, 2:36am

I don’t know much about the paper you mentioned, but the error is caused because you tried to backpropagate the created graph (generated_images = net(noise)) more than once through the inner for loop. General usage is, put your forward (generated_images = net(noise)) inside the inner for loop, such as,

for epoch in range(epochs):
    for input_images, _ in iter(train_loader):
        noise = torch.randn((noise_batch_size, latent_size))
        generated_images = net(noise)
        input_images = input_images.to(device)
        optimizer.zero_grad()
        loss = mmd(input_images.squeeze(), generated_images.squeeze())
        loss.backward()
        optimizer.step()

Simone256 · March 31, 2021, 1:34pm

You’re definitely right. As I said it was really really stupid! Thanks a lot!!!