How to train two networks at once in PyTorch?

etekiller · July 2, 2019, 11:45pm

I am trying to train two networks at once and calling two backward passes on two different losses but that breaks everything:

Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I do it this way:

        optimizer_d.zero_grad()

        npArray = randomNpArray(1, 3, 4, 11)
        randomTensor = numpyToTensor(npArray).cuda().float()        

        guess_real = discriminator(image)
        loss_d_real = criterion(guess_real, Tensor([1,0]).cuda() )
        realLossesArray.append(loss_d_real.item())
        
        generatedImage = generator(randomTensor)  

        guess_fake = discriminator(generatedImage)
        loss_d_fake = criterion(guess_fake, Tensor([0,1]).cuda() )
        fakeLossesArray.append(loss_d_fake.item())        

        loss_d = loss_d_real + loss_d_fake
        loss_d.backward()
        optimizer_d.step()

        # -------------------
        optimizer_d.zero_grad()
        optimizer_g.zero_grad()

        loss_g = criterion( guess_fake, Tensor([1,0]).cuda() )

        generatorLossesArray.append(loss_g.item())
        
        loss_g.backward()
        optimizer_g.step()

I don’t understand how to make PyTorch know that the second calculation of the gradient is completely irrelevant to the first one.

I’d appreciate any help.

Prerna_Dhareshwar · July 3, 2019, 10:11pm

So when you call backward the first time it calculated all the gradients and gets rid of the computational graph from memory, so since you need guess_fake for loss_g, there is no computational graph for it, so I suggest recalculating guess_fake after one step of discriminator.

You could also do retain_graph = True but then all the graphs are retained in memory and this could cause CUDA out of memory error. So just re calculate whatever you need for the second backward()after the first backward().

etekiller · July 4, 2019, 5:49am

Alright, awesome, thank you very much for your help. Now I understand it and it works :).