Tensor version mis-match when calling .backward()

Muhammad_Athar_Ganai · June 24, 2022, 12:20am

I am trying to implement a method which has three sub networks. An encoder, discriminator and a regressor.
The outputs of the encoder are inputs to both the discriminator as well as the regressor.
We also define 3 optimizers. They are as follows.

optimizer = optim.Adam(list(encoder.parameters()) + list(regressor.parameters()), lr=1e-4)
optimizer_conf = optim.Adam(list(encoder.parameters()), lr=1e-4)
optimizer_dm = optim.Adam(list(domain_predictor.parameters()), lr=1e-4)

Here’s my code.

# First update the encoder and regressor
                optimizer.zero_grad()
                features = encoder(data)
                output_pred = regressor(features)
                loss_total = criteron(output_pred, target)
                loss_total.backward(retain_graph=True)
                optimizer.step()

                # Now update just the domain_predictor
                optimizer_dm.zero_grad()
                output_dm = domain_predictor(features.detach())
                loss_dm = domain_criterion(output_dm, domain_target)
                loss_dm.backward(retain_graph=False)
                optimizer_dm.step()

                # Now update just the encoder using the domain loss
                optimizer_conf.zero_grad()
                output_dm_conf = domain_predictor(features)
                loss_conf = beta * conf_criterion(output_dm_conf, domain_target)
                loss_conf.backward(retain_graph=False)
                optimizer_conf.step()

This is the error I am getting, I have also put anomaly detection:

***one of the variables needed for gradient computation has been modified by 
an inplace operation: [torch.cuda.FloatTensor [1]] is at version 2; expected version 1 instead. 
Hint: the backtrace further above shows the operation that failed to compute its gradient. 
The variable in question was changed in there or anywhere later. Good luck!***
  File "/home/new_user/DA/Brain-MR-Segmentation-Playground/methods/unlearn.py", line 303, in train_unlearn
    loss_conf.backward(retain_graph=False)
  File "/home/new_user/DA/Brain-MR-Segmentation-Playground/methods/unlearn.py", line 524, in cmd_train
    loss, acc, dm_loss, conf_loss = train_unlearn(ctx, models, train_dataloaders, optimizers, criterions,
  File "/home/new_user/DA/Brain-MR-Segmentation-Playground/methods/runner.py", line 31, in run_main
    selected_method.cmd_train(ctx)
  File "/home/new_user/DA/Brain-MR-Segmentation-Playground/methods/runner.py", line 35, in <module>
    run_main()

Kindly help!!!

ptrblck · June 24, 2022, 6:47am

I guess the error is raised since you are using stale forward activations in the encoder during the loss_conf.backward() call.
I.e:

features = encoder(data) will store the intermediate activations (a0)
loss_total_backward(retain_graph=True) will use these intermediate activations (a0) and optimizer.step() will update the original parameters of eencoder (p0) to a new set (p1)
output_dm_conf = domain_predictor(features) is not detaching the features tensor so it’s still attached to encoder
loss_conf.backward() will try to calculate the gradients for domain_predictor (this should work) as well as for encoder using the stored forwad activations (a0)
since optimizer.step() was performed these a0 activations do not match the updated parameters p1 which will raise the error as it’s mathematically wrong.

Muhammad_Athar_Ganai · June 29, 2022, 5:57pm

Thankyou for the reply,
I am not sure I entirely understood the reason for the problem or how to solve it.

You mentioned that the features aren’t being detached, however in the second block we have

optimizer_dm.zero_grad()
output_dm = domain_predictor(features.detach())
loss_dm = domain_criterion(output_dm, domain_target)
loss_dm.backward(retain_graph=False)
optimizer_dm.step()

we are using features.detach().
What should we do to get the implementation right???

ptrblck · June 29, 2022, 6:36pm

I don’t think the second block creates the issue (you could run your code and check where it’s failing exactly) and in the third block you are not detaching the features tensor:

output_dm_conf = domain_predictor(features)

To solve the issue either detach the input (if it fits your use case) or check my previous post and try to explain how your use case should work in this case using stale forward activations.