Model weights apparently not being modified

Ghostv · April 16, 2024, 11:47am

Hi!
I’ve been using pytorch for a couple weeks, so i’m not sure if i’m doing something wrong, but it seems the model i’m building is not updating the weights in every pass.

My model is as follows:

class NN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.m0 = torch.nn.Sequential(
        torch.nn.Conv1d(402, 1, ks[0], stride=st),
        torch.nn.ReLU(True))
        self.m1 = torch.nn.Sequential(
        torch.nn.Conv1d(402, 1, ks[1], stride=st),
        torch.nn.ReLU(True))
        self.m2 = torch.nn.Sequential(
        torch.nn.Conv1d(402, 1, ks[2], stride=st),
        torch.nn.ReLU(True))
        self.m3 = torch.nn.Sequential(
        torch.nn.Conv1d(402, 1, ks[3], stride=st),
        torch.nn.ReLU(True))
        self.m4 = torch.nn.Sequential(
        torch.nn.Conv1d(402, 1, ks[4], stride=st),
        torch.nn.ReLU(True))
        self.m5 = torch.nn.Sequential(
        torch.nn.Conv1d(402, 1, ks[5], stride=st),
        torch.nn.ReLU(True))

        self.m0[0].weight.data = torch.ones(1,1,ks[0])*0.1
        self.m1[0].weight.data = torch.ones(1,1,ks[1])*0.1
        self.m2[0].weight.data = torch.ones(1,1,ks[2])*0.1
        self.m3[0].weight.data = torch.ones(1,1,ks[3])*0.1
        self.m4[0].weight.data = torch.ones(1,1,ks[4])*0.1
        self.m5[0].weight.data = torch.ones(1,1,ks[5])*0.1

        self.dense = torch.nn.Linear(11124,102,bias=False)
        self.softmax = torch.nn.Softmax(dim=0)

    def forward(self,x):
        y0 = self.m0(x)
        y1 = self.m1(x)
        y2 = self.m2(x)
        y3 = self.m3(x)
        y4 = self.m4(x)
        y5 = self.m5(x)
    
        y = torch.cat((y0,y1,y2,y3,y4,y5),dim=2)

        yf = torch.relu(self.dense(y))
        ys = self.softmax(yf)

        return ys

Which is basically 6 different conv layers with different kernel sizes applied in parallel, the results then get concatenated and passed through a dense layer to fit the labels (402x1 numerical categories or 402*102 one-hot encoded categories)

And my main loop is:

criterion = torch.nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
for epoch in range(0,10):
    optimizer.zero_grad()
    outputs = model(xt)
    loss = criterion(outputs, yt)
    loss.backward()
    optimizer.step()
    print('epoch {}, loss {}'.format(epoch, loss.item()))
    print(outputs[1:5,0,0])

The input data is a matrix with hyperspectral values for a set of frequencies and the labels are categories for the different curve shapes (i’ve tried using it as numerical categories and one-hot encoder both with the same result).

Now, when i run it, i just get:

epoch 0, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 1, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 2, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 3, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 4, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 5, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 6, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 7, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 8, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)
epoch 9, loss -0.0
tensor([0.0025, 0.0025, 0.0025, 0.0025], device='cuda:0',grad_fn=<SelectBackward0>)

Which apparently is not updating the weights and just keeping the random init ones (doesn’t matter which timestep i use). I’m not sure if i’m missing something, or if i’m doing something wrong, as i said, i’ve been using pytorch for barely two weeks. Any ideas?

Thanks in advance!

ptrblck · April 16, 2024, 5:47pm

nn.CrossEntropyLoss expects raw logits so remove the self.softmax from your model. Once this is done, also remove the torch.relu and return the logits created by self.dense(y) directly.

Ghostv · April 16, 2024, 5:59pm

I’m gonna try that and report back, thanks!.

I saw another post of a dude with exactly the same issue and he solved it by using a sigmoid instead of a softmax and a MSELoss for the loss fnx, i tried that and worked too. I’m gonna test your solution.

EDIT: Removing the softmax and the relu for the final output and using only the dense does the same, now updating weights.

Ghostv · April 16, 2024, 6:22pm

I’m not sure if it could also affect, but pytorch didn’t let me use weights with dim < 3, so i had to reshape my input from (n,w) to (n,1,w) since there is only one channel (same for the labels).

ptrblck · April 16, 2024, 9:36pm

Could you describe which weights are used and were failing?
Just want to make sure your shapes are correct and e.g. the loss calculation is correct.

Ghostv · April 16, 2024, 9:49pm

Sorry, i didn’t explain myself correcly!

My data is 402 observations, each one made of 2151 features with length 1 (only one channel). I was a bit confused on how to use conv1d, so, i tried to use the input as a (402,2151) tensor and code all the NN accordingly (using an (1,n) conv. kernel), but i got the error that the kernel must be 3 dimensions at least, so i reshaped the input tensor to (402,1,2151) and the same with the labels. I wasn’t sure if that may be the issue, maybe when there is only one channel the reshape must be done differently, etc.

I tried your solution, but i get similar results, the weights are not updating with each step.

About the same, for categorical labels, i’m using a one-hot encoder with shape (402,1,102), for the 102 classes i have, but i think i read that pytorch can work with numerical categories also, is that correct?

I checked this post: Model weights not being updated

And the solution posted there, changing the losses to MSEloss and the softmax to sigmoid worked for me, by doing that, the weights update on each step, but i’m not sure (based on what you wrote), if what i’m doing is correct.

ptrblck · April 17, 2024, 4:17pm

nn.Conv1d expects an input in the shape [batch_size, channels, seq_len] and will add a batch dimension with the size of 1 if it’s missing. Unsqueezing the tensor and adding the channel dimension might be the right approach depending on your use case.

I assume the weights receive valid gradients but the updates are just small or did you verify that no gradients are received at all? The latter case would not make sense since it seems other loss functions work.

By default nn.CrossEntropyLoss expects class indices, but in newer PyTorch versions also accepts “soft” targets, which can be a one-hot encoded tensor (although it would be wasteful compared to class indices since you would store a lot of zeroes without taking the advantage of using soft targets, i.e. probabilities != 0/1).

Different loss functions could work, but nn.CrossEntropyLoss is commonly used for multi-class classification use cases.

Ghostv · April 18, 2024, 9:23pm

How can i verify that the gradients are being received? I was comparing the model-parameters() and they were the same before-after the optimizer.step.

ptrblck · April 18, 2024, 10:10pm

You could either compare copies of the parameters before and after the optimizer.step() call or you could also check the .grad attributes of all trainable parameters after the backward call to confirm tat gradients are indeed computed.

Ghostv · April 18, 2024, 10:16pm

Just added in the main loop:

    a = list(model.parameters())[0].clone()
    optimizer.step()
    b = list(model.parameters())[0].clone()
    print(torch.equal(a.data,b.data))

And the output is True in every iteration. That’s following your suggestion (removing the relu and just using the output of the dense as the forward-pass output) and using a CrossEntropy loss.

Edit: And i just checked the grad in the last step and it’s all 0. The same happens regardless of the learning rate.

ptrblck · April 18, 2024, 10:25pm

In this case you could start by checking your loss and making sure it’s !=0.

Ghostv · April 18, 2024, 10:34pm

Yep, the loss is indeed 0, didn’t think of that…

tensor(-0., device='cuda:0', grad_fn=<DivBackward1>)

What could be causing it? Could it be maybe some issue with the data/label format?

My data -and its shape- is as follows:

tensor([[[0.3310, 0.3310, 0.3330,  ..., 0.6970, 0.6970, 0.6960]],
        [[0.2010, 0.2010, 0.2010,  ..., 0.5790, 0.5800, 0.5810]],
        [[0.1000, 0.0980, 0.0980,  ..., 0.3560, 0.3560, 0.3570]],
        ...,
        [[0.3040, 0.3110, 0.3120,  ..., 0.4890, 0.4950, 0.4990]],
        [[0.1830, 0.1920, 0.1980,  ..., 0.2700, 0.2740, 0.2770]],
        [[0.2120, 0.2250, 0.2450,  ..., 0.1970, 0.1980, 0.2000]]],
       device='cuda:0')

torch.Size([402, 1, 2151])

And my labels (i switched them to numerical):

tensor([[[  0.]],
        [[  0.]],
        [[  0.]],
        [[  0.]],
        [[  0.]],
        [[  0.]],
        [[  0.]],
        [[  0.]],
        [[  0.]],
        [[  0.]],
        [[  1.]],
        [[  1.]],
        [[  1.]],
        ...,
torch.Size([402, 1, 1])

Using them as-this or with a one hot encoder yields the same result.

Do i have to specify something when i declare the losses and/or the optimizer?
I’m initializing them like this:

criterion = torch.nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001)

Maybe i’m missing something

EDIT: I just checked and the forward pass is indeed generating a result, so, the issue seems to be the loss.

ptrblck · April 19, 2024, 12:44pm

Based on the shape of the model output and target you are working with a single class only, so all samples belong to class0 and the network cannot be wrong. nn.CrossEntropyLoss expects a model output in the shape [batch_size, nb_classes, *] where * denotes additional dimensions. In your case nb_classes=1 which would explain the issue.

Ghostv · April 19, 2024, 12:47pm

Could you explain that to me again?

I was just using one-hot encoder which was (402,1,102), hence, each label was a [0,0,0,0,… …,1,0,0] array with the var hot-encoded, i just changed it to numerical cat because of what we were talking, being now (402,1,1).

How could i exactly fix this?

Ghostv · April 19, 2024, 12:50pm

This is the label shape i was using up to yesterday, with the same results, i just changed it to numerical cats. I understood it was the same.

tensor([[[1., 0., 0.,  ..., 0., 0., 0.]],
        [[1., 0., 0.,  ..., 0., 0., 0.]],
        [[1., 0., 0.,  ..., 0., 0., 0.]],
        ...,
        [[0., 0., 0.,  ..., 0., 0., 1.]],
        [[0., 0., 0.,  ..., 0., 0., 1.]],
        [[0., 0., 0.,  ..., 0., 0., 1.]]], device='cuda:0')
>>> dim(yt)
torch.Size([402, 1, 102])

Could you please expand a little on what i’m doing wrong?

My classes are 102, how can i reshape that (if i’m using numerical cats) into a (402,102,1) array or into a (402,102,102) array with one-hot?

ptrblck · April 19, 2024, 2:59pm

I don’t know where these additional dimensions come from but this code shows the expected shapes for a muli-class classification use case:

batch_size = 402
nb_classes = 102

criterion = nn.CrossEntropyLoss()

output = torch.randn(batch_size, nb_classes, requires_grad=True)
# as class indices
target = torch.randint(0, nb_classes, (batch_size,))

loss = criterion(output, target)
print(loss)
# tensor(5.2173, grad_fn=<NllLossBackward0>)

# as soft targets
target = F.one_hot(target, 102)
target = target.float()
print(target.shape)
# torch.Size([402, 102])

loss = criterion(output, target)
print(loss)
# tensor(5.2173, grad_fn=<DivBackward1>)

Ghostv · April 19, 2024, 8:12pm

The problem is that my dense layer produces a (402,1,102) output, not (402,102). I think my issue is in the Conv1d args.

This was trivial to do in tensorflow, but somehow here i’m not sure if what i’m doing is correct or not. I wanna apply a standard convolution to a nx1 array, that’s what my conv1d layers aim to be (varying the kernel size en each case), am i doing it right?

My base input is a 402 (observations/rows/entries/batch_size) x 2151 (features) with only one channel per observation. I wanna apply n different convolutions to those 2151 observations, how would that be using Conv1d in pytorch? (Let’s say… one with a conv kernel… [0.1,0.1,0.1,0.1])

Ghostv · April 19, 2024, 10:02pm

I’m totally lost with this, i really have no clue what i’m doing and the more i search for input shapes of every layer i’m using, the more contradicting information i find. I’ve tried a million different ways but nothing seems to work.

Ghostv · April 19, 2024, 11:37pm

I think i solved it… i just added a reshape((402,102)) to the final output of the NN (which was (402,1,102)) and now it’s working apparently properly.

Arun_Sandy · April 20, 2024, 5:14pm

Hello,
I’m working on Conditional GAN to generate medical images and for testing purpose I ran the code on an resized image of 128x128 for 7500 epoch and this is working fine. But now when I change the code to train on a resized image of 512x512, The noise image generated on the first epoch is not being updated. I have no idea what’s going wrong.

Code for 128x128 image feed:

1. Loading the data :
train_transform = transforms.Compose([
transforms.Resize((128,128)),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])])
train_dataset = datasets.ImageFolder(root=‘data/KB_Images/’, transform=train_transform)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=2, shuffle=True)

2. GAN Architecture :
def weights_init(m):
classname = m.class.name
if classname.find(‘Conv’) != -1:
torch.nn.init.normal_(m.weight, 0.0, 0.02)
elif classname.find(‘BatchNorm’) != -1:
torch.nn.init.normal_(m.weight, 1.0, 0.02)
torch.nn.init.zeros_(m.bias)

class Generator(nn.Module):

def __init__(self):
    super(Generator, self).__init__()
    self.label_conditioned_generator =nn.Sequential(nn.Embedding(3, 100),
                  nn.Linear(100, 16))
    self.latent =nn.Sequential(nn.Linear(100, 4*4*512),
                               nn.LeakyReLU(0.2, inplace=True))
    self.model =nn.Sequential(nn.ConvTranspose2d(513, 64*8, 4, 2, 1, bias=False),
                  nn.BatchNorm2d(64*8, momentum=0.1,  eps=0.8),
                  nn.ReLU(True),
                  nn.ConvTranspose2d(64*8, 64*4, 4, 2, 1,bias=False),
                  nn.BatchNorm2d(64*4, momentum=0.1,  eps=0.8),
                  nn.ReLU(True), 
                  nn.ConvTranspose2d(64*4, 64*2, 4, 2, 1,bias=False),
                  nn.BatchNorm2d(64*2, momentum=0.1,  eps=0.8),
                  nn.ReLU(True), 
                  nn.ConvTranspose2d(64*2, 64*1, 4, 2, 1,bias=False),
                  nn.BatchNorm2d(64*1, momentum=0.1,  eps=0.8),
                  nn.ReLU(True), 
                  nn.ConvTranspose2d(64*1, 3, 4, 2, 1, bias=False),
                  nn.Tanh())
def forward(self, inputs):
    noise_vector, label = inputs
    label_output = self.label_conditioned_generator(label)
    label_output = label_output.view(-1, 1, 4, 4)
    latent_output = self.latent(noise_vector)
    latent_output = latent_output.view(-1, 512,4,4)
    concat = torch.cat((latent_output, label_output), dim=1)
    image = self.model(concat)
    #print(image.size())
    return image

class Discriminator(nn.Module):

def __init__(self):
    super(Discriminator, self).__init__()

    self.label_condition_disc = nn.Sequential(nn.Embedding(3, 100),
                  nn.Linear(100, 3*128*128))       
    
    self.model = nn.Sequential(nn.Conv2d(3, 64, 4, 2, 1, bias=False),
                  nn.LeakyReLU(0.2, inplace=True),
                  nn.Conv2d(64, 64*2, 4, 3, 2, bias=False),
                  nn.BatchNorm2d(64*2, momentum=0.1,  eps=0.8),
                  nn.LeakyReLU(0.2, inplace=True),
                  nn.Conv2d(64*2, 64*4, 4, 3,2, bias=False),
                  nn.BatchNorm2d(64*4, momentum=0.1,  eps=0.8),
                  nn.LeakyReLU(0.2, inplace=True),
                  nn.Conv2d(64*4, 64*8, 4, 3, 2, bias=False),
                  nn.BatchNorm2d(64*8, momentum=0.1,  eps=0.8),
                  nn.LeakyReLU(0.2, inplace=True), 
                  nn.Flatten(),
                  nn.Dropout(0.4),
                  nn.Linear(9216, 1),
                  nn.Sigmoid()
                 )

def forward(self, inputs):
    img, label = inputs
    label_output = self.label_condition_disc(label)
    label_output = label_output.view(-1, 3, 128, 128)
    concat = torch.cat((img, label_output), dim=-1)
    #print(concat.size())
    output = self.model(concat)
    return output

def noise(n, n_features=128): # this used to be 128

return Variable(torch.randn(n, n_features)).to(device)

3. Optimizer and Losses :
device = ‘cuda’
generator = Generator().to(device)
generator.apply(weights_init)
discriminator = Discriminator().to(device)
discriminator.apply(weights_init)
learning_rate = 0.0001
G_optimizer = optim.Adam(generator.parameters(), lr = learning_rate, betas=(0.5, 0.999))
D_optimizer = optim.Adam(discriminator.parameters(), lr = learning_rate, betas=(0.5, 0.999))
images =
test_noise = noise(100)
adversarial_loss = nn.BCELoss()

def generator_loss(fake_output, label):

gen_loss = adversarial_loss(fake_output, label)
#print(gen_loss)
return gen_loss

def discriminator_loss(output, label):

disc_loss = adversarial_loss(output, label)
return disc_loss

4. Training :
num_epochs = 7500
for epoch in range(1, num_epochs+1):

D_loss_list, G_loss_list = [], []
g_error, d_error=0.0, 0.0
print("Epoch :",epoch)
for index, (real_images, labels) in enumerate(train_loader):
    D_optimizer.zero_grad()
    real_images = real_images.to(device)
    labels = labels.to(device)
    labels = labels.unsqueeze(1).long()

   
    real_target = Variable(torch.ones(real_images.size(0), 1).to(device))
    fake_target = Variable(torch.zeros(real_images.size(0), 1).to(device))
   
    D_real_loss = discriminator_loss(discriminator((real_images, labels)), real_target)
    # print(discriminator(real_images))
    #D_real_loss.backward()
 
    noise_vector = torch.randn(real_images.size(0), 100, device=device)  
    noise_vector = noise_vector.to(device)
     
    
    generated_image = generator((noise_vector, labels))
    output = discriminator((generated_image.detach(), labels))
    D_fake_loss = discriminator_loss(output,  fake_target)

 
    # train with fake
    #D_fake_loss.backward()
   
    D_total_loss = (D_real_loss + D_fake_loss) / 2
    D_loss_list.append(D_total_loss)
   
    D_total_loss.backward()
    D_optimizer.step()

    # Train generator with real labels
    G_optimizer.zero_grad()
    G_loss = generator_loss(discriminator((generated_image, labels)), real_target)
    G_loss_list.append(G_loss)

    G_loss.backward()
    G_optimizer.step()
    
    g_error += G_loss
    d_error += D_total_loss
    
    if index%15 ==0 :
        vutils.save_image(real_images, '%s/real_samples.png' % "./results_epochs_7500", normalize = True)
        fake = generator((noise_vector,labels))
        vutils.save_image(fake.data, '%s/fake_samples_epoch_%03d.png' % ("./results_epochs_7500", epoch), normalize = True)
print('Epoch {}: g_loss: {:.8f} d_loss: {:.8f}\r'.format(epoch, g_error/index, d_error/index))

img = generator((noise_vector,labels)).cpu().detach()
# if epoch%100==0:
#     for i in range(img.shape[0]):
#         vutils.save_image(img[i], '%s/fake_samples_epoch_%03d_img_%01d.png' % ("./results", epoch,i), normalize = True)
img = make_grid(img)
images.append(img)

print(‘Training Finished’)
torch.save(generator.state_dict(), ‘Conditional-GAN.pth’)

frames =
for i in range(1,len(images)+1):
image = imageio.v2.imread(f’%s/fake_samples_epoch_%03d.png’% (“./results_epochs_7500”, i))
frames.append(image[:500])
print(f’%s/fake_samples_epoch_%03d.png’% (“./results_epochs_7500”, i))

imageio.mimsave(‘./progress.gif’, # output gif
frames, # array of input frames
fps = 7) # optional: frames per second

Real Sample Used for Training:
real_samples

Epoch 1 Noise Image :
fake_samples_epoch_001

Epoch 6000 Noise Image :
fake_samples_epoch_6630

Below is Code for 512x512 Image feed :
train_transform = transforms.Compose([
transforms.Resize((512,512)),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])])
train_dataset = datasets.ImageFolder(root=‘data/KB_Images/’, transform=train_transform)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=1, shuffle=True)

3. Optimizer and Losses :
def weights_init(m):

classname = m.__class__.__name__
if classname.find('Conv') != -1:
    torch.nn.init.normal_(m.weight, 0.0, 0.02)
elif classname.find('BatchNorm') != -1:
    torch.nn.init.normal_(m.weight, 1.0, 0.02)
    torch.nn.init.zeros_(m.bias)

class Generator(nn.Module):

def __init__(self):
    super(Generator, self).__init__()

    self.label_conditioned_generator = nn.Sequential(
        nn.Embedding(3, 100),
        nn.Linear(100, 256))  # Changed from 16 to 256

    # Initial generator from latent space
    self.latent = nn.Sequential(
        nn.Linear(100, 16*16*512),
        nn.LeakyReLU(0.2, inplace=True))

    # Generator model definition
    self.model = nn.Sequential(
        nn.ConvTranspose2d(513, 256, 4, 2, 1, bias=False),
        nn.BatchNorm2d(256),
        nn.ReLU(True),
        nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
        nn.BatchNorm2d(128),
        nn.ReLU(True), 
        nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),
        nn.BatchNorm2d(64),
        nn.ReLU(True), 
        nn.ConvTranspose2d(64, 32, 4, 2, 1, bias=False),
        nn.BatchNorm2d(32),
        nn.ReLU(True),
        nn.ConvTranspose2d(32, 3, 4, 2, 1, bias=False),
        nn.Tanh())

def forward(self, inputs):
    noise_vector, label = inputs
    label_output = self.label_conditioned_generator(label)
    label_output = label_output.view(-1, 1, 16, 16)  # Correctly reshaped now
    latent_output = self.latent(noise_vector)
    latent_output = latent_output.view(-1, 512, 16, 16)
    concat = torch.cat((latent_output, label_output), dim=1)
    image = self.model(concat)
    return image

class Discriminator(nn.Module):

def __init__(self):
    super(Discriminator, self).__init__()
     
    self.label_condition_disc = nn.Sequential(nn.Embedding(3, 100),
                  nn.Linear(100, 3*512*512))  # Adjusted for 512x512 resolution     
    
    self.model = nn.Sequential(
        nn.Conv2d(6, 64, 4, 2, 1, bias=False),
        nn.LeakyReLU(0.2, inplace=True),
        nn.Conv2d(64, 128, 4, 2, 1, bias=False),
        nn.BatchNorm2d(128),
        nn.LeakyReLU(0.2, inplace=True),
        nn.Conv2d(128, 256, 4, 2, 1, bias=False),
        nn.BatchNorm2d(256),
        nn.LeakyReLU(0.2, inplace=True),
        nn.Conv2d(256, 512, 4, 2, 1, bias=False),
        nn.BatchNorm2d(512),
        nn.LeakyReLU(0.2, inplace=True),
        nn.Flatten(),
        nn.Dropout(0.4),
        nn.Linear(524288, 1),  # Updated to match the correct number of features
        nn.Sigmoid()
    )

def forward(self, inputs):
    img, label = inputs
    label_output = self.label_condition_disc(label)
    label_output = label_output.view(-1, 3, 512, 512)
    concat = torch.cat((img, label_output), dim=1)
    output = self.model(concat)
    return output

def noise(n, n_features=512): # this used to be 128
return Variable(torch.randn(n, n_features)).to(device)

device = ‘cuda:1’
generator = Generator().to(device)
generator.apply(weights_init)
discriminator = Discriminator().to(device)
discriminator.apply(weights_init)
learning_rate = 0.0002
G_optimizer = optim.Adam(generator.parameters(), lr = learning_rate, betas=(0.5, 0.999))
D_optimizer = optim.Adam(discriminator.parameters(), lr = learning_rate, betas=(0.5, 0.999))
images =
test_noise = noise(512) # This used ti be 100
adversarial_loss = nn.BCELoss()

def generator_loss(fake_output, label):

gen_loss = adversarial_loss(fake_output, label)
#print(gen_loss)
return gen_loss

def discriminator_loss(output, label):

disc_loss = adversarial_loss(output, label)
return disc_loss

4. Training :
num_epochs = 10000
for epoch in range(1, num_epochs + 1):

D_loss_list, G_loss_list = [], []
g_error, d_error=0.0, 0.0
print("Epoch :",epoch)
for index, (real_images, labels) in enumerate(train_loader):
    D_optimizer.zero_grad()
    real_images = real_images.to(device)
    labels = labels.to(device)
    labels = labels.unsqueeze(1).long()

    real_target = Variable(torch.ones(real_images.size(0), 1).to(device))
    fake_target = Variable(torch.zeros(real_images.size(0), 1).to(device))
   
    D_real_loss = discriminator_loss(discriminator((real_images, labels)), real_target)
    # print(discriminator(real_images))
    #D_real_loss.backward()
 
    noise_vector = torch.randn(real_images.size(0), 100, device=device)  
    noise_vector = noise_vector.to(device)
     
    
    generated_image = generator((noise_vector, labels))
    output = discriminator((generated_image.detach(), labels))
    D_fake_loss = discriminator_loss(output,  fake_target)

 
    # train with fake
    #D_fake_loss.backward()
   
    D_total_loss = (D_real_loss + D_fake_loss) / 2
    D_loss_list.append(D_total_loss)
   
    D_total_loss.backward()
    D_optimizer.step()

    # Train generator with real labels
    G_optimizer.zero_grad()
    G_loss = generator_loss(discriminator((generated_image, labels)), real_target)
    G_loss_list.append(G_loss)

    G_loss.backward()
    G_optimizer.step()
    
    g_error += G_loss
    d_error += D_total_loss
    
    if index%15 ==0 :
        vutils.save_image(real_images, '%s/real_samples.png' % "./results_epochs_10000_512x512", normalize = True)
        fake = generator((noise_vector,labels))
        vutils.save_image(fake.data, '%s/fake_samples_epoch_%03d.png' % ("./results_epochs_10000_512x512", epoch), normalize = True)
print('Epoch {}: g_loss: {:.8f} d_loss: {:.8f}\r'.format(epoch, g_error/index, d_error/index))

img = generator((noise_vector,labels)).cpu().detach()
if epoch%100==0:
    for i in range(img.shape[0]):
        vutils.save_image(img[i], '%s/fake_samples_epoch_%03d_img_%01d.png' % ("./results_epochs_10000_512x512", epoch,i), normalize = True)
img = make_grid(img)
images.append(img)

print(‘Training Finished’)
torch.save(generator.state_dict(), ‘Conditional-GAN.pth’)

frames =
for i in range(1,len(images)+1):

image = imageio.v2.imread(f'%s/fake_samples_epoch_%03d.png'% ("./results_epochs_10000_512x512", i))
frames.append(image[:500])
print(f'%s/fake_samples_epoch_%03d.png'% ("./results_epochs_10000_512x512", i))

imageio.mimsave(‘./progress.gif’, # output gif
frames, # array of input frames
fps = 7) # optional: frames per second

Real Sample Used for Training:

Epoch 1 Noise Image :

Epoch 2000 Noise Image :

Below is the loss output of both Generator and Discriminator
Epoch : 1
Epoch 1: g_loss: 1.32907557 d_loss: 42.10420227
Epoch : 2
Epoch 2: g_loss: 0.00000000 d_loss: 50.18867874
Epoch : 3
Epoch 3: g_loss: 0.00000000 d_loss: 50.18867874
Epoch : 4
Epoch 4: g_loss: 0.00000000 d_loss: 50.18867874
Epoch : 5
Epoch 5: g_loss: 0.00000000 d_loss: 50.18867874
Epoch : 6
Epoch 6: g_loss: 0.00000000 d_loss: 50.18867874
Epoch : 7
Epoch 7: g_loss: 0.00000000 d_loss: 50.18867874

Both G and D are being constant for all the epoch.

Can you please let me know where make the changes so the model can be trained properly on a 512x512 image