Loss doesn't decrease

I’ve been trying to develop an Autoencoder model for the task of knowledge representation where my input is a sequence of images. Loss of this model (both training and testing) doesn’t decrease. It is instead fluctuating and of course, my image reconstruction is very poor. I’ve tried the following:

  1. Changing the learning rate between 1 and 0.001
  2. Increasing and decreasing the batch size
  3. With and without dropout layers

Encoder-Decoder model

class Encoder_Decoder(nn.Module):

def __init__(self):
    super(Encoder_Decoder, self).__init__()
    
    #Encoder
    self.encoder = nn.Sequential(nn.Conv3d(in_channels=3, out_channels=16, kernel_size=(3, 3, 3), padding=(0,0,0)),  
                                 nn.ReLU(), nn.BatchNorm3d(16), 
                                 nn.MaxPool3d(kernel_size=(1,2,2)),
                                 nn.Conv3d(in_channels=16, out_channels=64, kernel_size=(3, 3, 3), padding=(0,1,1)), 
                                 nn.ReLU(), nn.BatchNorm3d(64), 
                                 nn.MaxPool3d(kernel_size=(2,2,2)),
                                 nn.Conv3d(in_channels=64, out_channels=256, kernel_size=(3, 3, 3), padding=(1,1,1)), 
                                 nn.ReLU(), nn.BatchNorm3d(256), 
                                 nn.MaxPool3d(kernel_size=(2,2,2)),
                                 nn.Conv3d(in_channels=256, out_channels=64, kernel_size=(3,3,3),padding=(1,1,1)), 
                                 nn.ReLU(), nn.BatchNorm3d(64), 
                                 nn.MaxPool3d(kernel_size=(2,2,2)),
                                 nn.Conv3d(in_channels=64, out_channels=16, kernel_size=(1,1,1), padding=(0,0,0)), 
                                 nn.ReLU(), nn.BatchNorm3d(16), 
                                 nn.MaxPool3d(kernel_size=(2,1,1)),
                                 nn.Conv3d(in_channels=16, out_channels=4, kernel_size=(1,1,1), padding=(0,0,0)), 
                                 nn.ReLU(), nn.BatchNorm3d(4), 
                                 nn.MaxPool3d(kernel_size=(1,1,1)))
    
    
    #Decoder
    self.decoder = nn.Sequential(nn.ConvTranspose3d(in_channels=4, out_channels=16, kernel_size=3), 
                                 nn.ReLU(), 
                                 nn.Upsample(scale_factor=(2,2,2)),
                                 nn.ConvTranspose3d(in_channels=16, out_channels=64, kernel_size=3), 
                                 nn.ReLU(), 
                                 nn.Upsample(scale_factor=(2,2,2)),
                                 nn.ConvTranspose3d(in_channels=64, out_channels=256, kernel_size=3), 
                                 nn.ReLU(), 
                                 nn.Upsample(scale_factor=(2,2,2)),
                                 nn.ConvTranspose3d(in_channels=256, out_channels=64, kernel_size=3), 
                                 nn.ReLU(), 
                                 nn.Upsample(scale_factor=(2,2,2)),
                                 nn.ConvTranspose3d(in_channels=64, out_channels=16, kernel_size=1), 
                                 nn.ReLU(), 
                                 nn.Upsample(size=(22,28,28)),
                                 nn.ConvTranspose3d(in_channels=16, out_channels=3, kernel_size=1), 
                                 nn.ReLU(), 
                                 nn.Upsample(size=(22,28,28)))
    
def forward(self,x):

    # x has the shapw (16,22,3,28,28)

    
    x1 = self.encoder(x)
    # x1 has the shape (16,4,1,1,1)

    
    x2 = self.decoder(x1)
    # x2 has the shape (16,22,3,28,28)

    return x1, x2

Train:

def train(model, trainloader, criterion, optimizer, epoch):

model.train()

    

for batch_idx, inputs in enumerate(trainloader):
    

    inputs = inputs.float()

    

    if torch.cuda.is_available():

        inputs = inputs.to("cuda")

    optimizer.zero_grad()

    encoded_vectors,outputs = model(inputs)

    loss = criterion(outputs,inputs)

    loss.backward()

    optimizer.step()

    if batch_idx % 50 == 0:

        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(

            epoch, batch_idx * len(inputs), len(trainloader.dataset),

            100. * batch_idx / len(trainloader), loss.item()))

test:

def test(model, criterion1, testloader):

model.eval()

test_loss = 0

for batch_idx, inputs in enumerate(testloader):

    if torch.cuda.is_available():

        inputs = inputs.cuda()

    encoded_vectors,outputs = model(inputs)

    loss = criterion(outputs,inputs)

    test_loss += loss.item() * inputs.shape[0]

test_loss /= len(testloader.dataset)

print('\nTest set: Average loss: {:.4f}\n'.format(test_loss))

if abs(test_loss) <= 0.005:

    return True

else:

    return False

def main():

model = Encoder_Decoder()



model = model.to('cuda')

criterion = torch.nn.MSELoss()

optimizer = optim.AdamW(model.parameters(), lr=0.005, eps=1e-3, amsgrad=False)

for epoch in range(20):

    train(model, train_loader, criterion, optimizer, epoch)

    test(model, criterion, test_loader)
   

return model

if name == “main”:

model = main()

I have also tried experimenting different loss functions and different dimensions for the latent space (encoding vector) but nothing worked out and stuck at this part for more than 2 weeks. @ptrblck Kindly address this. Many thanks in advance.

This is how my loss looks like:

Train Epoch: 0 [0/2000 (0%)] Loss: 0.890425
Train Epoch: 0 [400/2000 (20%)] Loss: 0.739331
Train Epoch: 0 [800/2000 (40%)] Loss: 0.740495
Train Epoch: 0 [1200/2000 (60%)] Loss: 0.737210
Train Epoch: 0 [1600/2000 (80%)] Loss: 0.742648

Test set: Average loss: 0.7384

Train Epoch: 1 [0/2000 (0%)] Loss: 0.738974
Train Epoch: 1 [400/2000 (20%)] Loss: 0.739232
Train Epoch: 1 [800/2000 (40%)] Loss: 0.737139
Train Epoch: 1 [1200/2000 (60%)] Loss: 0.736564
Train Epoch: 1 [1600/2000 (80%)] Loss: 0.737216

Test set: Average loss: 0.7384

Train Epoch: 2 [0/2000 (0%)] Loss: 0.737539
Train Epoch: 2 [400/2000 (20%)] Loss: 0.737013
Train Epoch: 2 [800/2000 (40%)] Loss: 0.745166
Train Epoch: 2 [1200/2000 (60%)] Loss: 0.740684
Train Epoch: 2 [1600/2000 (80%)] Loss: 0.735286

Test set: Average loss: 0.7384

Train Epoch: 3 [0/2000 (0%)] Loss: 0.738531
Train Epoch: 3 [400/2000 (20%)] Loss: 0.741940
Train Epoch: 3 [800/2000 (40%)] Loss: 0.737881
Train Epoch: 3 [1200/2000 (60%)] Loss: 0.737666
Train Epoch: 3 [1600/2000 (80%)] Loss: 0.739561

Test set: Average loss: 0.7384

Train Epoch: 4 [0/2000 (0%)] Loss: 0.741185
Train Epoch: 4 [400/2000 (20%)] Loss: 0.736491
Train Epoch: 4 [800/2000 (40%)] Loss: 0.741242
Train Epoch: 4 [1200/2000 (60%)] Loss: 0.738887
Train Epoch: 4 [1600/2000 (80%)] Loss: 0.738182

Test set: Average loss: 0.7384

Train Epoch: 5 [0/2000 (0%)] Loss: 0.738633
Train Epoch: 5 [400/2000 (20%)] Loss: 0.741287
Train Epoch: 5 [800/2000 (40%)] Loss: 0.744609
Train Epoch: 5 [1200/2000 (60%)] Loss: 0.738787
Train Epoch: 5 [1600/2000 (80%)] Loss: 0.742097

Test set: Average loss: 0.7384

Train Epoch: 6 [0/2000 (0%)] Loss: 0.737597
Train Epoch: 6 [400/2000 (20%)] Loss: 0.738081
Train Epoch: 6 [800/2000 (40%)] Loss: 0.734609
Train Epoch: 6 [1200/2000 (60%)] Loss: 0.738837
Train Epoch: 6 [1600/2000 (80%)] Loss: 0.739030

Test set: Average loss: 0.7384

Train Epoch: 7 [0/2000 (0%)] Loss: 0.742665
Train Epoch: 7 [400/2000 (20%)] Loss: 0.737820
Train Epoch: 7 [800/2000 (40%)] Loss: 0.740105
Train Epoch: 7 [1200/2000 (60%)] Loss: 0.734893
Train Epoch: 7 [1600/2000 (80%)] Loss: 0.740252

Test set: Average loss: 0.7384

Train Epoch: 8 [0/2000 (0%)] Loss: 0.740616
Train Epoch: 8 [400/2000 (20%)] Loss: 0.741398
Train Epoch: 8 [800/2000 (40%)] Loss: 0.740078
Train Epoch: 8 [1200/2000 (60%)] Loss: 0.739548
Train Epoch: 8 [1600/2000 (80%)] Loss: 0.739944

Test set: Average loss: 0.7384

Train Epoch: 9 [0/2000 (0%)] Loss: 0.739905
Train Epoch: 9 [400/2000 (20%)] Loss: 0.738651
Train Epoch: 9 [800/2000 (40%)] Loss: 0.737514
Train Epoch: 9 [1200/2000 (60%)] Loss: 0.734963
Train Epoch: 9 [1600/2000 (80%)] Loss: 0.735477

Test set: Average loss: 0.7384

Train Epoch: 10 [0/2000 (0%)] Loss: 0.736167
Train Epoch: 10 [400/2000 (20%)] Loss: 0.738089
Train Epoch: 10 [800/2000 (40%)] Loss: 0.733951
Train Epoch: 10 [1200/2000 (60%)] Loss: 0.738294
Train Epoch: 10 [1600/2000 (80%)] Loss: 0.739406

Test set: Average loss: 0.7384

Train Epoch: 11 [0/2000 (0%)] Loss: 0.739151
Train Epoch: 11 [400/2000 (20%)] Loss: 0.737162
Train Epoch: 11 [800/2000 (40%)] Loss: 0.737700
Train Epoch: 11 [1200/2000 (60%)] Loss: 0.738951
Train Epoch: 11 [1600/2000 (80%)] Loss: 0.736479

Test set: Average loss: 0.7384

Train Epoch: 12 [0/2000 (0%)] Loss: 0.738607
Train Epoch: 12 [400/2000 (20%)] Loss: 0.742146
Train Epoch: 12 [800/2000 (40%)] Loss: 0.740505
Train Epoch: 12 [1200/2000 (60%)] Loss: 0.735908
Train Epoch: 12 [1600/2000 (80%)] Loss: 0.742282

Test set: Average loss: 0.7384

Train Epoch: 13 [0/2000 (0%)] Loss: 0.736525
Train Epoch: 13 [400/2000 (20%)] Loss: 0.736685
Train Epoch: 13 [800/2000 (40%)] Loss: 0.734824
Train Epoch: 13 [1200/2000 (60%)] Loss: 0.740992
Train Epoch: 13 [1600/2000 (80%)] Loss: 0.738559

Test set: Average loss: 0.7384

Train Epoch: 14 [0/2000 (0%)] Loss: 0.735638
Train Epoch: 14 [400/2000 (20%)] Loss: 0.737805
Train Epoch: 14 [800/2000 (40%)] Loss: 0.741408
Train Epoch: 14 [1200/2000 (60%)] Loss: 0.731682
Train Epoch: 14 [1600/2000 (80%)] Loss: 0.738875

Test set: Average loss: 0.7384

Train Epoch: 15 [0/2000 (0%)] Loss: 0.740533
Train Epoch: 15 [400/2000 (20%)] Loss: 0.737641
Train Epoch: 15 [800/2000 (40%)] Loss: 0.738011
Train Epoch: 15 [1200/2000 (60%)] Loss: 0.741101
Train Epoch: 15 [1600/2000 (80%)] Loss: 0.739203

Test set: Average loss: 0.7384

Train Epoch: 16 [0/2000 (0%)] Loss: 0.741356
Train Epoch: 16 [400/2000 (20%)] Loss: 0.739178
Train Epoch: 16 [800/2000 (40%)] Loss: 0.737916
Train Epoch: 16 [1200/2000 (60%)] Loss: 0.743919
Train Epoch: 16 [1600/2000 (80%)] Loss: 0.736833

Test set: Average loss: 0.7384

Train Epoch: 17 [0/2000 (0%)] Loss: 0.739630
Train Epoch: 17 [400/2000 (20%)] Loss: 0.739462
Train Epoch: 17 [800/2000 (40%)] Loss: 0.741527
Train Epoch: 17 [1200/2000 (60%)] Loss: 0.733570
Train Epoch: 17 [1600/2000 (80%)] Loss: 0.741055

Test set: Average loss: 0.7384

Train Epoch: 18 [0/2000 (0%)] Loss: 0.739831
Train Epoch: 18 [400/2000 (20%)] Loss: 0.740010
Train Epoch: 18 [800/2000 (40%)] Loss: 0.736455
Train Epoch: 18 [1200/2000 (60%)] Loss: 0.737576
Train Epoch: 18 [1600/2000 (80%)] Loss: 0.736869

Test set: Average loss: 0.7384

Train Epoch: 19 [0/2000 (0%)] Loss: 0.739427
Train Epoch: 19 [400/2000 (20%)] Loss: 0.740225
Train Epoch: 19 [800/2000 (40%)] Loss: 0.743299
Train Epoch: 19 [1200/2000 (60%)] Loss: 0.737947
Train Epoch: 19 [1600/2000 (80%)] Loss: 0.735534

Test set: Average loss: 0.738

Am I doing some obvious blunder? or Is it possible that my data has no patterns or valid features to learn?

You could start by overfitting a small dataset (e.g. just 10 samples) by playing around with the hyperparameters (optimizer, learning rate etc.).
If that doesn’t help, I would suggest to simplify the model.
Once your can come up with a model and hyperparameters, which overfit this small dataset, you could scale up the use case again by using more data.

1 Like

I Have changed the model and then tried as you suggested. Although my efficiency isn’t that great, the technique you have suggested is really a good approach. It helped me a lot.

Thank you :slight_smile: