Help me find inplace error

Angry_potato · June 28, 2020, 2:57am

Hi,
I am trying to implement a pix2pix gan,here is my architecture

generator

ij=True
class gen(nn.Module):
    def __init__(self):
        super(gen, self).__init__()
        self.main=nn.Sequential(
        nn.Conv2d(3,64,4,2,1),
        nn.ReLU(ij),
        nn.Conv2d(64,128,4,2,1),
        nn.BatchNorm2d(128),
        nn.ReLU(ij),
        nn.Conv2d(128,256,4,2,1),
        nn.BatchNorm2d(256),
        nn.ReLU(ij),
        nn.Conv2d(256,512,4,2,1),
        nn.BatchNorm2d(512),
        nn.ReLU(ij),
        nn.Conv2d(512,1024,4),
        nn.ConvTranspose2d(1024,512,4,1),
        nn.BatchNorm2d(512),
        nn.ReLU(ij),
        nn.ConvTranspose2d(512,256,4,2,1),
        nn.BatchNorm2d(256),
        nn.ReLU(ij),
        nn.ConvTranspose2d(256,128,4,2,1),
        nn.BatchNorm2d(128),
        nn.ReLU(ij),
        nn.ConvTranspose2d(128,64,4,2,1),
        nn.BatchNorm2d(64),
        nn.ReLU(ij),
        nn.ConvTranspose2d(64,3,4,2,1),
        nn.Tanh()
        )
    def forward(self,inp):
        return self.main(inp)

Differentiator

class diff(torch.nn.Module):
    def __init__(self):
        super(diff, self).__init__()
        self.main=nn.Sequential(
        nn.Conv2d(6,64,4,2,1),
        nn.LeakyReLU(0.2,inplace=ij),
        nn.Conv2d(64,128,4,2,1),
        nn.BatchNorm2d(128),
        nn.LeakyReLU(0.2,inplace=ij),
        nn.Conv2d(128,256,4,2,1),
        nn.BatchNorm2d(256),
        nn.LeakyReLU(0.2,inplace=ij),
        nn.Conv2d(256,512,4,2,1),
        nn.BatchNorm2d(512),
        nn.LeakyReLU(0.2,inplace=ij),
        nn.Conv2d(512,256,4),
        nn.LeakyReLU(0.2,inplace=ij),
        nn.Conv2d(256,1,13),
        nn.Sigmoid()
        )
    def forward(self,r1,r2):
        r=torch.cat((r1,r2),0)
        r=r.view(1,6,256,256)
        return self.main(r)

Iteration loop

torch.autograd.set_detect_anomaly(True)
while iteration<10000:
  gen_loss=0
  dig_loss=0
  gen_im=[]
  actual_image=[]
  i=0
  ########################################train generator##########################
  for i in range(batch_size):
    im=X[count].float()/255
    im=im.view(3,256,256)
    im=norm(im)
    im=im.view(1,3,256,256)
    
    pred=geng(im.float())
    gen_im.append(pred)
    score=random.uniform(0.85,1)
    label=torch.tensor(score).view(1,1,1,1).to(device)

    im_actual=Y[count].float()
    im_actual=norm(im_actual.view(3,256,256)).view(1,3,256,256)
    actual_image.append(im_actual)

    class_predicted=dig(pred,im_actual)
    
    loss_gen=loss_fn(class_predicted,label)
    
    gen_loss=gen_loss+loss_gen
    count=count+1

  gen_loss=gen_loss/batch_size
  gen_loss.backward(retain_graph=True)
  og.step()
  geng.zero_grad()
  dig.zero_grad()


  #########################################train discriminator######################
  for i in range(batch_size):
    score_fake=random.uniform(0.0,0.15)
    score_real=random.uniform(0.85,1.00)

    label_fake=torch.tensor(score_fake).view(1,1,1,1).to(device)
    label_real=torch.tensor(score_real).view(1,1,1,1).to(device)

    loss_real=loss_fn(dig(actual_image[i],actual_image[i]),label_real)
    loss_fake=loss_fn(dig(gen_im[i],actual_image[i]),label_fake)
  
    dig_loss=dig_loss+(loss_real+loss_fake)
  di_loss=dig_loss/batch_size
  di_loss.backward()
  od.step()

this is the error that i encounter.

RuntimeError                              Traceback (most recent call last)

<ipython-input-72-85806ea5cb74> in <module>()
     49     dig_loss=dig_loss+(loss_real+loss_fake)
     50   di_loss=dig_loss/batch_size
---> 51   di_loss.backward()
     52   od.step()
     53 

1 frames

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     98     Variable._execution_engine.run_backward(
     99         tensors, grad_tensors, retain_graph, create_graph,
--> 100         allow_unreachable=True)  # allow_unreachable flag
    101 
    102 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3, 4, 4]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Please help me find the inplace operation.I know that the fault is not in ReLU’s inplace as i have used relu inplace before and did not encounter this error.

Angry_potato · June 28, 2020, 4:07am

I think that this is an issue with colab as it is running perfectly on my machine.

gchochla · June 28, 2020, 4:42am

There are a couple of things going on with your code. You should include how you are defining the optimizers exactly. Given that it is not complete, I cannot try and replicate exactly the error you are getting, however:

You have already applied a step of your optimizer on geng, however, in the above line of code, you are using its outputs gen_im to compute the loss of the differentiator. Even if it not causing any trouble, try detaching them with .detach() or, if you actually want to backpropagate the error to the generator, generate these images once more after og.step().

Given that you retain graph, it is possible that you are reapplying past gradients.

Hope these help!

Angry_potato · June 29, 2020, 2:42pm

Thank you for the reply,
I solved the error by switching to pytorch 1.4.0.
This issue has something to do with pytorch version 1.5.0, i think as i found similar issues on github, and it was solved by switching to previous version of pytorch.

Also thank you for pointing out mistake that i made regarding backprop through generator.