A GAN model in PyTorch does not learn

tlkahn · March 8, 2020, 6:28pm

Can anyone kindly take a look at my jupyter notebook and kindly let me know what part I did is erroneous? Thanks so much… Here is the unlearning model I spent a whole day working on: https://github.com/tlkahn/my-notebooks/blob/master/GAN-pytorch.ipynb

ptrblck · March 9, 2020, 1:57am

Just by skimming through your code, it seems you are freeing the discriminator:



class GAN(nn.Module):
    """GAN model"""
    def __init__(self, generator, discriminator):
        super().__init__()
        self.generator = generator
        self.discriminator = discriminator
        for param in self.discriminator.parameters():
            param.requires_grad = False

    def forward(self, x):
        gen_img = self.generator(x)
        return self.discriminator(gen_img)

Which will not train it. Is this on purpose or were you planning on unfreezing the parameters at some point?

JakeAndFinn · March 9, 2020, 2:55am

Here are two GAN in pytorch that are pretty simple and easy to follow if they help you.

Here is how to set up two models GEN and DESCRIM and train them

# Set models for training
Disc.train()
Gen.train()
for epoch in range(num_epochs):
    # Each batch
    for batch_i, (real_images, _) in enumerate(train_loader):
        batch_size = real_images.size(0)
        
        ## Important rescaling step ## 
        # rescale input images from [0,1) to [-1, 1)
        real_images = real_images*2 - 1
        # Train Discriminator
        d_optimizer.zero_grad()
        # Train with real images
        D_real = Disc(real_images)
        d_real_loss = real_loss(D_real, smooth=True) # Use label smoothing
        
        # Next step train with fake images
        # Generate fake images
        z = np.random.uniform(-1, 1, size=(batch_size, z_size)) # Random Noise
        z = torch.from_numpy(z).float() # Convert to a flot
        fake_images = Gen(z) # Predict with Generator (DO NOT TRAIN GENERATOR HERE) train one at a time
        
        # Computer fake loss
        D_fake = Disc(fake_images)
        d_fake_loss = fake_loss(D_fake)
        # add up loss and back prop
        d_loss = d_real_loss + d_fake_loss
        d_loss.backward()
        d_optimizer.step()
        
        '''
        # Generator training
        '''
        g_optimizer.zero_grad()
        # Generator fake images and train
        z = np.random.uniform(-1, 1, size=(batch_size, z_size))
        z = torch.from_numpy(z).float()
        fake_images = Gen(z)
        
        # Compute the discriminator losses on fake images 
        # using flipped labels!
        D_fake = Disc(fake_images)
        g_loss = real_loss(D_fake) # use real loss to flip labels
        
        # perform backprop
        g_loss.backward()
        g_optimizer.step()
        
        # Print some loss stats
        if batch_i % print_every == 0:
            # print discriminator and generator loss
            print('Epoch [{:5d}/{:5d}] | d_loss: {:6.4f} | g_loss: {:6.4f}'.format(
                    epoch+1, num_epochs, d_loss.item(), g_loss.item()))
    
    ## AFTER EACH EPOCH##
    # append discriminator loss and generator loss
    losses.append((d_loss.item(), g_loss.item()))
    
    # generate and save sample, fake images
    Gen.eval() # eval mode for generating samples
    samples_z = Gen(fixed_z)
    samples.append(samples_z)
    Gen.train() # back to train mode

tlkahn · March 9, 2020, 1:12pm

Thanks a lot. I pulled those lines out of GAN model, and move them to later training phase and it seems to be working.

for param in self.discriminator.parameters():
            param.requires_grad = False

tlkahn · March 9, 2020, 1:12pm

Thanks. Very useful references

Vijit_Mehrotra · March 12, 2020, 12:17pm

Hi Everyone i am getting RuntimeError: element 11 of tensors does not require grad and does not have a grad_fn. when I am running a GAN Architecture for getting the Gaussian Model and I am using WGANLOSS as well.

How to fix the error so that i can get th required output for my model.

ptrblck · March 12, 2020, 11:37pm

Double post from here.

saba · May 24, 2020, 10:19am

Hi Ptrblck,

I hope you are well. Sorry, I need to run a 3D GAN. my inputs are gray scale patches in 3D and I want to create 3D patch as well.
Can I use any 2D GAN, and just convert any 2d to 3d?
Could you please suggest me any pytorch link to have DCGAN in 3D ?

Many thanks

ptrblck · May 24, 2020, 10:30pm

I don’t know any recent 3D DCGAN implementations, but you could try to use the approach of Voxel DCGAN or 3DGAN, which are both a bot older by now.

If you are working with static volumetric shapes, you could use the depth dimension as the channel dimension in a standard 2D GAN, although I don’t know, how well this would work.
I think your suggestion makes sense and you could try to replace all nn.*2D layers with their 3D equivalent.

saba · May 24, 2020, 11:30pm

Many thanks .I will tell you the results

saba · May 26, 2020, 12:29pm

Hi Ptrblck,

Sorry, I am using 2D-DCGAN generator and convert it to the 3D. The error is : Given input size per channel: (1 x 1 x 1). Calculated output size per channel: (5 x 5 x -157). Output size is too small.
I expect that instead of -157 to see ((5x5x33) (13x13x21),(21x21x11), would you please help me to find out why this happen? I started by 101x1x1 and my target output is (21x21x11)

 class Generator(nn.Module):
        def __init__(self, nz):
        super(Generator, self).__init__()
        self.nz=nz
        self.main = nn.Sequential(
            nn.ConvTranspose3d(101,33,kernel_size=(5,5,5), stride=(2,2,2), padding=(0,0,86), bias=False),
            nn.BatchNorm3d(33),
            nn.ReLU(True),
            nn.ConvTranspose3d(33, 21,kernel_size=(5,5,5), stride=(2,2,2) , padding=(0,0,24), bias=False),
            nn.BatchNorm3d(21),
            nn.ReLU(True),
            nn.ConvTranspose3d(21, 11, kernel_size=(5,5,5), stride=(2,2,2), padding=(4,4,17), bias=False),
            nn.Tanh())
 
    def forward(self, input):
        return self.main(input)

#### call the generator   nz=101
netG = Generator(101).to(device)
if (device.type == 'cuda') and (ngpu > 1): netG = nn.DataParallel(netG, list(range(ngpu)))
netG.apply(weights_init)
noise = torch.randn(b_size,nz,1,1,1,device=device)
fake = Generator(noise)

saba · June 2, 2020, 1:44am

Hi Ptrblck,

I want to use zero mean and unit normalization for the discriminator inputs (not rescalling between [-1 1]). in this case I should remove the Tanh from generator. which activation function u recommend instead of Tanh in generator in the last layer? or I can run without any activation function in the end?

chetan_patil · June 2, 2020, 2:16am

Hi, if your output spans between 0 and 1, you could use nn.Sigmoid()
Without using an activation will make your output an arbitrary number not between -1,+1 or 0,1.

saba · July 13, 2020, 1:49am

Hi Ptrblck,

I implement my conditional Gan. The batch size is 64 and input patch size of 21x21, and the condition which I pass is 168 different volumes number which can be from 1 to 100 for example =tensor([ 3, 36, 19, 12, 16, 6, 7, 2, 45, 12, 65, 44, 17, 8, 15, 15, 14, 47, 20, 9, 16, 25, 56, 11, 22, 8, 5, 3, 7, 6, 25, 10, 36, 1, 17, 2, 22, 3, 10, 13, 9, 14, 15, 11, 20, 16, 3, 10, 4, 18, 1, 15, 9, 6, 16, 55, 1, 14, 6, 17, 6, 6, 10, 7]).

img_size=21
N_Class=168 (168 different volumes as conditions)
3)lr1=0.0002
4)lr2=0.0002
5)batch size =64
6)optimized Adam as default
7)ngf = 64
8)criterion = nn.BCELoss()
real_label = 1
fake_label = 0

I applied this code. There is no error from my code and it runs, but it makes noise for me and fake images are not meaningful they are just noise.
Would you please help me with that? The code is

# custom weights initialization called on netG and netD
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)
        
# Generator Code

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        
        self.l1= nn.Sequential(nn.ConvTranspose2d(100, ngf * 4, 3, 1, 0, bias=False),
        nn.BatchNorm2d(ngf * 4), nn.ReLU(True))
        self.l2 = nn.Sequential(nn.ConvTranspose2d(168, ngf * 4, 3, 1, 0, bias=False),
        nn.BatchNorm2d(ngf * 4), nn.ReLU(True))

        self.l3=nn.Sequential(nn.ConvTranspose2d(ngf * 8, ngf *4 , 3, 1, 0, bias=False),
        nn.BatchNorm2d(ngf * 4),
        nn.ReLU(True))

        self.l4=nn.Sequential(nn.ConvTranspose2d( ngf * 4, ngf * 2, 3, 1, 0, bias=False),
        nn.BatchNorm2d(ngf * 2),
        nn.ReLU(True))
    
        self.l5=nn.Sequential(nn.ConvTranspose2d( ngf * 2, ngf, 3, 2, 1, bias=False),
        nn.BatchNorm2d(ngf),
        nn.ReLU(True))

        self.l6=nn.Sequential(nn.ConvTranspose2d( ngf, 1, 3, 2, 3, bias=False),nn.Sigmoid())

    def forward(self, input,Volume):
        x =self.l1(input)

        y = self.l2(Volume)

        xx = torch.cat([x, y], 1)
        output = self.l3(xx)
        output = self.l4(output)
        output = self.l5(output)
        output = self.l6(output)

        return output
        
    
# Create the generator
netG = Generator().to(device)
#  to mean=0, stdev=0.2.
netG.apply(weights_init)
#print(netG)
fixed_noise = torch.randn(64, nz, 1, 1, device=device)
fixed_noise_Se = torch.randn(4500, nz, 1, 1, device=device)

#-----------Discriminator ------------------

class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.l1= nn.Sequential(nn.Conv2d(1, int(ndf/2), 4, 2, 1, bias=False),nn.LeakyReLU(0.2, inplace=True))
        self.l2= nn.Sequential(nn.Conv2d(168, int(ndf/2), 4, 2, 1, bias=False),nn.LeakyReLU(0.2, inplace=True))
        self.l3=nn.Sequential(nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),nn.BatchNorm2d(ndf * 2),nn.LeakyReLU(0.2, inplace=True))
        self.drop_out3 = nn.Dropout(0.5)
        self.l4= nn.Sequential(nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False), nn.BatchNorm2d(ndf * 4), nn.LeakyReLU(0.2, inplace=True))
        self.drop_out4 = nn.Dropout(0.5)
        self.l5= nn.Sequential(nn.Conv2d(ndf * 4, 1, 4, 2, 1, bias=False),nn.Sigmoid())

    def forward(self, input,Volume):
        
         x =self.l1(input)

         y = self.l2(Volume)

         out = torch.cat([x, y], 1)

         out = self.l3(out)
         out=self.drop_out3(out)

         out=self.l4(out)

         out=self.drop_out4(out)

         out=self.l5(out)
      
         return out
    
# Create the Discriminator
netD = Discriminator(ngpu).to(device)
# Apply the weights_init function to randomly initialize all weights
netD.apply(weights_init)

# Setup Adam optimizers for both G and D
optimizerD = optim.Adam(netD.parameters(), lr=lr1, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr2, betas=(beta1, 0.999))


# Training Loop
print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):

    # For each batch in the dataloader
    for pos, neg in zip(trainloader,trainloaderNeg):
        images1,labels,Volumes=pos
        images1=images1.float()
        Volumes=Volumes.long()
        Negpach=neg
        Negpach=Negpach.float()
    
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        netD.zero_grad()
# ----------------Format batch inputs ----------
        
        real_cpu = images1.to(device)
# ------- to add volumes as condition------------
        Real_volume=Volumes.to(device).long().squeeze(1)
        Real_volume=Real_volume.type(torch.LongTensor)

# ----------------label for Dis1 ------------
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, device=device)
        label=label.to(device)
        
# -------- pass the Volumes (condition) to the discriminator -------------
    
        real_y = torch.zeros(batch_size, N_Class)
        real_y = real_y.scatter_(1, Real_volume.view(batch_size, 1), 1).view(batch_size, N_Class, 1, 1).contiguous()
        real_y = Variable(real_y.expand(-1, -1, img_size, img_size)).to(device)
        netD=netD.float()
# --- apply dis on conditions --------------
        output = netD(real_cpu,real_y).view(-1)

        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        
        # Calculate gradients for D in backward pass
        errD_real.backward()
    
        ## Train with all-fake batch
# ------------Generate batch of latent vectors
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        
# --------- Generate fake image batch with G and pass the condition ---------
        netG=netG.float()
        gen_labels = (torch.rand(batch_size, 1) * N_Class).type(torch.LongTensor)
        gen_y = torch.zeros(batch_size, N_Class)
        gen_y = Variable(gen_y.scatter_(1, gen_labels.view(batch_size, 1), 1).view(batch_size, N_Class,1,1)).to(device)
        fake = netG(noise,gen_y)
    
#------------- multiply with negative patch ---------------
      
        fake44=torch.mul(fake,Negpach)
         
# ---- labels for Dis for fake as input---------------     
        label.fill_(fake_label)
        label=label.to(device)
# --------- pass condition for dis---------
        gen_y_for_D = gen_y.view(batch_size, N_Class, 1, 1).contiguous().expand(-1, -1, img_size, img_size)
        output = netD(fake44.detach(),gen_y_for_D).view(-1)
        
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch
        errD_fake.backward()
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()
        
        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
# ---- labels for update G--------------

        label.fill_(real_label)  # fake labels are real for generator cost
#        label=torch.mul(label,0.9)    
        label=label.to(device)
# ----- apply dis on G with real label -----------

        output = netD(fake44.detach(),gen_y_for_D).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        # Update G
        optimizerG.step()
    
    
    if epoch % 1 == 0:     
         with torch.no_grad():
            fake = netG(fixed_noise,gen_y).detach().cpu()

            plt.close("all")
            plt.figure()
            plt.figure(figsize=(8,8))
            plt.axis("off")
            plt.title("Fake Images epoch "+str(epoch))
    
            plt.imshow(np.transpose(vutils.make_grid(fake.detach().to(device)[:64], padding=2, normalize=True,range=(0.2,1)).cpu(),(1,2,0)))
            plt.savefig(os.path.join(root_dirDurringTraining13+'/'+'Epoch='+str(epoch)+'Seed='+str(manualSeed))+'fakesor2.jpg')

            fake55=torch.mul(fake,Negpach)


            plt.close("all")
            plt.figure()
            plt.figure(figsize=(8,8))
            plt.axis("off")
            plt.title("Fake Images epoch "+str(epoch))
        
            plt.imshow(np.transpose(vutils.make_grid(fake55.detach().to(device)[:64], padding=2, normalize=True,range=(0,.5)).cpu(),(1,2,0)))
            plt.savefig(os.path.join(root_dirDurringTraining13+'/'+'Epoch='+str(epoch))+'fakesmul55.jpg')
            
    torch.save(netG.state_dict(), '%s/netG_epoch_%d.pth' % (root_dirDurringTraining15, epoch))
    torch.save(netD.state_dict(), '%s/netD_epoch_%d.pth' % (root_dirDurringTraining15, epoch))

ptrblck · July 14, 2020, 1:56am

Unfortunately, I cannot be really useful here, as I’m not a GAN expert.
Generally, I would recommend to take a look at similar architectures of conditional GANs and try to have a look at the tricks, which were used to make the training converge.

saba · August 27, 2020, 4:57am

Hi Ptrblck,

I have a question from DCGAN.Would you please tell me what exactly happen in " errD = errD_real + errD_fake" and what is the relation to the optimizer, because in the previous lines the gradient was computed two times , how the OptimizedD understand to get (errD) to optimize?

netD.zero_grad()
        # Format batch
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        # Forward pass real batch through D
        output = netD(real_cpu).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item()

        ## Train with all-fake batch
        # Generate batch of latent vectors
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        # Generate fake image batch with G
        fake = netG(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output = netD(fake.detach()).view(-1)
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        # Add the gradients from the all-real and all-fake batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()
```

ptrblck · August 27, 2020, 5:48am

The gradients will be calculated by

errD_real.backward()
errD_fake.backward()

not by errD, as .backward() is never used on it in the code snippet.
I guess errD is just used to print the sum of the real and fake loss for the discriminator.