Loss doesn't decrease while training

Hello there,
I want to classify landscape pictures weather they do include some cars or not, but while testing the loss is not decreasing, it seems to randomly bounce between a big range of values. (2.X - 0.1X)
Here is the code I ran:

import torch
import torchvision
from torchvision import transforms
from PIL import Image
from os import listdir
import os
import random
import torch.optim as optim
from torch.autograd import Variable
import torch.nn.functional as F
import torch.nn as nn

TRAINDATAPATH = 'C:/Users/.../Desktop/train/'
TESTDATAPATH = 'C:/Users/.../Desktop/test/'

"""normalize = transforms.Normalize(
   mean=[0.485, 0.456, 0.406],
   std=[0.229, 0.224, 0.225]
)"""
normalize = transforms.Normalize(
   mean=[0.5, 0.5, 0.5],
   std=[0.5, 0.5, 0.5]
)
transforms = transforms.Compose([transforms.Resize(256),
                                 transforms.CenterCrop(256),
                                 transforms.ToTensor(),
                                 normalize])

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
train_data_list = []
target_list = []
train_data = []

batch_size = 1

files = listdir(TRAINDATAPATH)

for i in range(len(listdir(TRAINDATAPATH))):
    try:
        f = random.choice(files)
        files.remove(f)
        img = Image.open(TRAINDATAPATH + f)
        img_tensor = transforms(img) # (3,256,256)
        train_data_list.append(img_tensor)
        isObj = 1 if 'obj' in f else 0
        isNotObj = 0 if 'obj' in f else 1
        target = [isObj, isNotObj]

        target_list.append(target)
        if len(train_data_list) >= 1:
            train_data.append((torch.stack(train_data_list), target_list))
            train_data_list = []
            target_list = []
            print('Loaded batch ', int(len(train_data)/batch_size), 'of ', int(len(listdir(TRAINDATAPATH))/batch_size))
            print('Percentage Done: ', 100*int(len(train_data)/batch_size)/int(len(listdir(TRAINDATAPATH))/batch_size), '%')
    except Exception:
        print("Error occured but ignored")
        print(str(Exception))
        continue

class Netz(nn.Module):
    def __init__(self):
        super(Netz, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 12, kernel_size=5)
        self.conv3 = nn.Conv2d(12, 18, kernel_size=5)
        self.conv4 = nn.Conv2d(18, 24, kernel_size=5)
        self.fc1 = nn.Linear(3456, 1000)
        self.fc2 = nn.Linear(1000, 2)

    def forward(self, x):
        x = self.conv1(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.conv3(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.conv4(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = x.view(-1,3456)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        return torch.sigmoid(x)

model = Netz()
model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))

optimizer = optim.Adam(model.parameters(), lr=0.0005)#Adadelta(model.parameters(), lr=10)
def train(epoch):
    global model
    
    model.train()
    batch_idx = 0
    for data, target in train_data:
        batch_idx += 1
        data = data.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
        target = torch.Tensor(target).to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
        data = Variable(data)
        target = Variable(target)
        optimizer.zero_grad()
        
        output = model(data)

        criterion = F.binary_cross_entropy
        loss = criterion(output, target)
        loss.backward()
        
        optimizer.step()
        print('Train Epoch: '+ str(epoch) + '\tLoss: ' + str(loss.data.item()) )


def test():
    global model
    
    model.eval()
    files = listdir(TESTDATAPATH)
    f = random.choice(files)
    img = Image.open(TESTDATAPATH + f)
    img_eval_tensor = transforms(img)
    img_eval_tensor.unsqueeze_(0)
    data = Variable(img_eval_tensor.to(torch.device("cuda" if torch.cuda.is_available() else "cpu")) )
    out = model(data)
    string_prediction = str(out.data.max(0, keepdim=True)[1])
    print(string_prediction[9:10])

for epoch in range(1,4):
    train(epoch)
i = 100
while i > 0:
    test()
    i -= 1

Are there any major mistakes? Tried it with learning rates between 0.0001 and 100, nothing gave good results.

In the TRAINDATAPATH are thousands of car images with the filename “obj_XXX.jpg” and some other images without cars with other filenames don’t including “obj”.
In the TESTDATAPATH are just random images, some with cars, some without. The NN classifies them all as “not including cars” or “0” which is incorrect.

Maybe anyone can help?

Some Training results:

Train Epoch: 1 Loss: 0.11131585389375687
Train Epoch: 1 Loss: 0.12454738467931747
Train Epoch: 1 Loss: 0.30456408858299255
Train Epoch: 1 Loss: 0.2579435408115387
Train Epoch: 1 Loss: 0.20009629428386688
Train Epoch: 1 Loss: 0.5095208883285522
Train Epoch: 1 Loss: 0.3108166456222534
Train Epoch: 1 Loss: 0.7725784778594971
Train Epoch: 1 Loss: 0.1262883096933365
Train Epoch: 1 Loss: 0.05287583917379379
Train Epoch: 1 Loss: 0.7402727603912354
Train Epoch: 1 Loss: 0.11581829935312271
Train Epoch: 1 Loss: 0.038188107311725616

First, the normalization std looks like big,
Second, add the Batch normalization is recommended.
Third, the convolution channel is quite shallow why don’t you make more deeper.
If you read the code of any famous architecture they start such as 16 or 32 channel.

nn.conv2d(3,16,kernel_size=3)
like this.
I hope this answer is helpful.
Thank you.

1 Like
  1. Add Batch Normalization to your network.
  2. Increase the batch-size of your dataloader. A mini-batch size of 1 might not be efficient in optimizing the loss function for the entire dataset.

Okay,
thanks for all help.
@caped-vigilante

  1. What do you mean with batch normalization? Can you give an example?
  2. Which batch size is would you recommend? I will test it with a batch size of 16.

batch size of 4 or above, is a good start. It all depends upon how much GPU memory you have. Bigger the batch size, the training is more stable.

For the batch normalization part, you can try this:

class Netz(nn.Module):
    def __init__(self):
        super(Netz, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, kernel_size=5)
        self.batchnorm1 = nn.BatchNorm2d(6)
        self.conv2 = nn.Conv2d(6, 12, kernel_size=5)
        self.batchnorm2 = nn.BatchNorm2d(12)
        self.conv3 = nn.Conv2d(12, 18, kernel_size=5)
        self.batchnorm3 = nn.BatchNorm2d(18)
        self.conv4 = nn.Conv2d(18, 24, kernel_size=5)
        self.batchnorm4 = nn.BatchNorm2d(24)
        self.fc1 = nn.Linear(3456, 1000)
        self.fc2 = nn.Linear(1000, 2)

    def forward(self, x):
        x = self.conv1(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.batchnorm1(x)
        x = self.conv2(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.batchnorm2(x)
        x = self.conv3(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.batchnorm3(x)
        x = self.conv4(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.batchnorm4(x)
        x = x.view(-1,3456)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        return torch.sigmoid(x)
1 Like

Appreciate it. I have a GTX1060 with 6GB. I’m going to have a look at my graphics card memory while training.

It depends on the data, which in this case is a 256*256 size image, and also on the complexity of the model. With your GPU, You could try batch-sizes of 32 / 64 / 128.

1 Like

I’m glad that you’ve solved the issue.
However, note that you are currently using a multi-label approach, as your model outputs 2 units and you are using sigmoid + F.binary_cross_entropy.
In other words, each sample might contain zero, one, or two valid classes.
Based on your description, it seems you are working on a binary classification, where either the class is valid or not.
If that’s the case, you should use a single output neuron and provide a target with values in the range [0, 1] and in the shape [batch_size, 1].

1 Like