BiGAN implementation

Hi,

I am trying to reproduce the results of this repo related to BiGAN.

Here is the error that I get:

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [5, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The part of code related to this error:

 def run_on_batch(model,discriminator,data,mask,decay,rdecay,args, optimizer,optimizer_d,epoch):
        ret_f,ret = model(data, mask, decay,rdecay,args)
        
        disc = discriminator(ret['originals'], mask, args)
        #print("BATCH LOSS",ret['loss'])
        #print("BATCH LOSS",disc['loss_g'])
        #print("BATCH LOSS",disc['loss_d'])
        #print("one batch done")

        if optimizer is not None:
            #print("OPTIMIZE")
            
            if (epoch%10==0):
                optimizer_d.zero_grad()
                disc['loss_d'].backward(retain_graph=True)
                optimizer_d.step()

            optimizer.zero_grad()
            (ret['loss']+disc['loss_g']).backward()
            optimizer.step()

        return ret_f,ret,disc

I also do not know the reason for back propagating the loss_d every 10 epochs?

I guess the error is raised by keeping the computation graph alive via:

disc['loss_d'].backward(retain_graph=True)

Could you explain why retain_graph=True is used as its often used as a workaround masking other (real) issues causing new ones.

I am not sure but maybe it is related to the this part of the paper(the repo is the implementation of this paper):

while we are training the discriminator, we want to maximize the probability of correctly classifying the actual (as real) and the generated values (as fake). and beside that simultaneously the generator wants to minimise the probability that D correctly identifies the fake instances. so when we are back propagating the discriminator loss we also need to backprobagate the generator loss at same time so we need to keep the computation_graph same between these 2 backpropagations.

Even if you need to keep the computation graph alive to backpropagate multiple times through it you should still make sure the last backward call clears it by using the default retain_graph=False argument.

which part of the code should be changed to make sure for it? In this code, the last backward has the default value for retain_graph. Is not it enough?