"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1024, 1]] is at version 2; expected version 1 instead."

Priyank_Pathak · July 19, 2023, 3:38am

I have cascaded SR GAN (two generators and two discriminator)
(single generator and discriminator code is working fine, not uploaded here).

The essentials are as follows :

self.no_of_gan = [16, 64, 256]
with g_optim_wrapper.optim_context(self):
      batch_outputs = self.generator(batch_inputs)
      # list of 2 generator outputs : (B, 3, 64, 64) & (B, 3, 256, 256) : GAN : 4x Super-resolution
all_parsed_losses_g = 0 
set_requires_grad(self.discriminator, False)

for i in range(2):
      req_size = self.no_of_gan[i+1]
      gt_resize = torch.nn.functional.interpolate(batch_gt_data, size=req_size)
      # batch_gt_data:  B, 3, 256, 256
      parsed_losses_g, log_vars_d = self.g_step_with_optim(
                batch_outputs=batch_outputs[i], batch_gt_data=gt_resize,
                optim_wrapper=optim_wrapper, index=i)
            
      all_parsed_losses_g += parsed_losses_g
      log_vars.update(log_vars_d)

The above calculates GAN error (but doesn’t update the parameter though). Then I calculate the discriminator error and update the weights of discriminator.

set_requires_grad(self.discriminator, True)

for i in range(self.n_layers):
      req_size = self.no_of_gan[i+1]
      gt_resize = torch.nn.functional.interpolate(batch_gt_data, size=req_size)
     
       log_vars_d = self.d_step_with_optim(
                batch_outputs=batch_outputs[i].detach(),
                batch_gt_data=gt_resize,
                optim_wrapper=optim_wrapper, index=i)

        log_vars.update(log_vars_d)

set_requires_grad(self.discriminator, False)
all_parsed_losses_g.backward()
set_requires_grad(self.discriminator, True)

This should work as updated weights of discriminator have nothing to do with generator loss backward.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1024, 1]] is at version 2; expected version 1 instead.

I have uploaded the code for error reproducibility:

NOTE1: If I move the generator update step before the discriminator update step, the error is resolved. If updating the discriminator weight is the problem, then why not doing the cascade (1 generator and 1 discriminator) not see the same error?

NOTE2: Updating Generator weights after discriminator works as long as I’m not involving Discriminator based loss. (Pixel loss, Perception error etc. all work fine till I do discriminator loss)

ptrblck · July 19, 2023, 7:26am

You might be running into this error if stale forward activations are used during a backward call.

Priyank_Pathak · July 19, 2023, 10:19pm

If I’m interpreting your answer correctly, doing .backward() from stale output should throw this error. That is,
“Updating the discriminator is somehow updating the generator’s weights, which in turn when doing the generator’s loss .backward() after the discriminator’s weight update raises the error due to stale generator.”

Before discriminator update, I did:

    model_weights = {}
    for p in self.generator.named_parameters():
        model_weights[p[0]] = p[1].mean().item()

And After Discriminator update

new_model_weights = {}
for p in self.generator.named_parameters():
    new_model_weights[p[0]] = p[1].mean().item() 
    assert new_model_weights[p[0]]  == model_weights[p[0]]

The other possibility is Generator weights use Discriminator’s output, which becomes stale after Discriminator’s weight update step. This also is unlikely as set_requires_grad(self.discriminator, False) forces Gradient independence from Discriminators activations (hopefully) as verified by

for child in self.discriminator.children():
    for name, param in child.named_parameters():
        if param.requires_grad:
            print("******", name, param.shape)
            # should not print anything here

ptrblck · July 20, 2023, 5:18am

That’s not the case and the backward call could still try to compute gradients in self.discriminator if a valid computation graph was created before you have frozen the parameters.
Here is a small example showing the error:

modelA = nn.Sequential(
    nn.Linear(1, 1),
    nn.ReLU(),
    nn.Linear(1, 1))

modelB = nn.Sequential(
    nn.Linear(1, 1),
    nn.ReLU(),
    nn.Linear(1, 1))

optA = torch.optim.SGD(modelA.parameters(), lr=1.)
optB = torch.optim.SGD(modelB.parameters(), lr=1.)

x = torch.randn(1, 1)
out = modelA(x)
out.mean().backward(retain_graph=True)
optA.step()

for param in modelA.parameters():
    param.requires_grad = False
out = modelB(out)#.detach())
out.mean().backward()

If you properly .detach() the input to modelB the code works fine.