Problem with inplace operations

machina · September 15, 2021, 12:17pm

The goal is to train this GAN based on multiple loss functions, I keep getting this error about inplace operations. Here is the problem section:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_2885/1167578605.py in <module>
     28 
     29         opt_gen.zero_grad()
---> 30         lossG.backward()
     31         opt_gen.step()
     32 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 784]], which is output 0 of UnsqueezeBackward0, is at version 33; expected version 32 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

If any additional information is needed, please let me know. Thanks!

huahuanZ · September 15, 2021, 12:35pm

The inplace operation might take place at adv_ex[idx] = adv_ex[idx] + purturbation.
You can create a new tensor to store the added value.

machina · September 15, 2021, 4:32pm

sorry, I’m new to PyTorch. How would I go about doing that (while keeping the same size and label)?

mrityu · September 15, 2021, 5:20pm

You can use adv_ex[idx] = adv_ex[idx].clone() + purturbation

gphilip · September 15, 2021, 6:06pm

This still looks in-place to me, since it edits the content of the list adv_ex. But perhaps I am wrong. Does this fix your error message, @machina ?

huahuanZ · September 16, 2021, 6:08am

To my understanding, you should make it this way

...
tmp_adv_ex = []
for idx, item in enumerate(adv_ex):
    tmp_adv_ex.append(adv_ex[idx] + purturbation)
adv_ex = torch.cat(tmp_adv_ex, dim=0) # now it won't cause in-place modifiy
...

machina · September 28, 2021, 12:04am

That gets me past the lines I posted above, thank you!
Now I’m getting another error below:

  ### Train Discriminator: max log(D(x)) + log(1 - D(G(z)))
        
        disc_real = disc(real).view(-1)
        disc_fake = disc(adv_ex).view(-1)
        lossD = -torch.mean(torch.log(disc(real)) + torch.log(1. - disc(adv_ex)))
        # can decide later how much that loss term weighs
        
        opt_disc.zero_grad()
        lossD.backward(retain_graph=True)
        opt_disc.step()

the error is happening at the line initializing disc_fake:

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_25258/819410292.py in <module>
     47         ### Train Discriminator: max log(D(x)) + log(1 - D(G(z)))
     48         disc_real = disc(real).view(-1)
---> 49         disc_fake = disc(adv_ex).view(-1)
     50         lossD = -torch.mean(torch.log(disc(real)) + torch.log(1. - disc(adv_ex)))
     51         # can decide later how much that loss term weighs
                                   ...
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

 ** On entry to SGEMM  parameter number 10 had an illegal value

How should I interpret this error?

machina · September 28, 2021, 12:12am

also this is the whole block so far for reference:

for epoch in range(num_epochs):
    for batch_idx, (real, labels) in enumerate(loader):
        #get a fixed input batch to display gen output
        if batch_idx == 0:
            if epoch == 0:
                fixed_input = real.view(-1,784).to(device)
        
        adv_ex = real.clone().reshape(-1,784).to(device) # [32, 784] advex copy of first batch flattened
        real = real.view(-1, 784).to(device) # [32, 784] # real batch flattened
        labels = labels.to(device) # size() [32] 32 labels in batch
        
        
        #purturb each image in adv_ex
        tmp_adv_ex = []
        for idx, item in enumerate(adv_ex):
            purturbation = gen(adv_ex[idx])
            tmp_adv_ex.append(adv_ex[idx] + purturbation)
        adv_ex = torch.cat(tmp_adv_ex, dim=0)
        adv_ex = real.clone().reshape(-1,784).to(device)
        
         
        ### Train Generator: min log(1 - D(G(z))) <-> max log(D(G(z))
        output = disc(adv_ex).view(-1)
        lossG = torch.mean(torch.log(1. - output)) #get loss for gen's desired desc pred

        adv_ex = adv_ex.reshape(-1,1,28,28)
        f_pred = target(adv_ex)
        f_loss = CE_loss(f_pred, labels) #add loss for gens desired f pred
        loss_G_Final = f_loss+lossG # can change the weight of this loss term later

        opt_gen.zero_grad()
        loss_G_Final.backward()
        opt_gen.step()
        
        ### Train Discriminator: max log(D(x)) + log(1 - D(G(z)))
        
        disc_real = disc(real).view(-1)
        disc_fake = disc(adv_ex).view(-1)
        lossD = -torch.mean(torch.log(disc(real)) + torch.log(1. - disc(adv_ex)))
        # can decide later how much that loss term weighs
        
        opt_disc.zero_grad()
        lossD.backward(retain_graph=True)
        opt_disc.step()