One of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 48, 3, 3]] is at version 2; expected version 1 instead

KeisukeShimokawa · May 28, 2020, 2:08pm

When I was training the GAN, the first iteration worked well, and the second training probably caused an error on the Discriminator side.

The details of the error are as follows. This dimension [3, 48, 3, 3] means the first layer of the Discriminator, but I didn’t use any in place operation.

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 48, 3, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

What is causing this error?

Full Discriminator code is below

https://github.com/KeisukeShimokawa/papers-challenge/blob/master/src/gan/FQGAN/models/FQGAN_64.py

albanD · May 28, 2020, 2:14pm

Hi,

Is the training loop the one here?
Given that it happens not at the first backward it can be:

An optimizer.step() modified a weight that was already used by a forward and you call backward after:

out = model(inp)
opt_model.step() # Modifies the model's weights inplace
out.sum().backward() # Will fail with this error

You reuse some variables from one iteration to the next by mistake. To check this, the simplest thing is to wrap the content of the for loop into a single function to make sure everything goes out of scope properly:

def one_step(sample):
    # Do one step with sample using everything in scope.
for sample in dataloader:
    one_step(sample)

JH_S · October 18, 2021, 3:35pm

you are my angle. Have a nice day!