Valid Loss in convolutional neural network

Hello!
I am newbie in neural networks, maybe somebody here may help me?
So, i have 2 classes and my own dataset with 17000(7500 for one class, 7500 for another for training, and 2000 for testing) images. I am trying to teach CNN to recognize modified and non modified images. My code is below:

import torch
import numpy as np

# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')

    ###################Đ’ Data loading ###################
import torchvision.datasets
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torch.utils.data.sampler import SubsetRandomSampler

# Read data from folders
main_path = '../ready_dataset_2classes'
train_data_path = main_path + '/train'
test_data_path = main_path + '/test'
weigths_path = '../ready_dataset_2classes/weights/weights_for_2classes.pt'

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 32 #20
# percentage of training set to use as validation
valid_size = 0.2

# convert data to a normalized torch.FloatTensor
transform = transforms.Compose([
    transforms.Resize(32),
    transforms.CenterCrop(32),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

# choose the training and test datasets
train_data = torchvision.datasets.ImageFolder(root=train_data_path, transform=transform)
train_data_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True,  num_workers=0)
test_data = torchvision.datasets.ImageFolder(root=test_data_path, transform=transform)
test_data_loader  = DataLoader(test_data, batch_size=batch_size, shuffle=True, num_workers=0) 
# obtain training indices that will be used for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
    sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers)

# specify the image classes
classes = ['Modified', 'Original']

# helper function to un-normalize and display an image
def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    plt.imshow(np.transpose(img, (1, 2, 0)))  # convert from Tensor image

    # creating checkpoints
def savePoint(main_path, model, optimizer, epoch, valid_loss_min):
    torch.save({
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'epoch': epoch,
        'valid_loss_min': valid_loss_min,
    }, main_path)
    
def loadPoint(main_path, model, optimizer, epoch, valid_loss_min):
    checkpoint = torch.load(main_path)
    model.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    epoch = checkpoint['epoch']
    valid_loss_min = checkpoint['valid_loss_min']

    ################### Network architecture definition ###################
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # convolutional layer (sees 32x32x3 image tensor)
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        # convolutional layer (sees 16x16x16 tensor)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # convolutional layer (sees 8x8x32 tensor)
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        # max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        # linear layer (64 * 4 * 4 -> 500)
        self.fc1 = nn.Linear(64 * 4 * 4, 500)
        # linear layer (500 -> 10)
        self.fc2 = nn.Linear(500, 10)
        # dropout layer (p=0.25)
        self.dropout = nn.Dropout(0.1) #0.25

    def forward(self, x):
        # add sequence of convolutional and max pooling layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        # flatten image input
        x = x.view(-1, 64 * 4 * 4)
        # add dropout layer
        x = self.dropout(x)
        # add 1st hidden layer, with relu activation function
        x = F.relu(self.fc1(x))
        # add dropout layer
        x = self.dropout(x)
        # add 2nd hidden layer, with relu activation function
        x = self.fc2(x)
        return x

# create a complete CNN
model = Net()
print(model)

# move tensors to GPU if CUDA is available
if train_on_gpu:
    model.cuda()

    #Loss and optimization specification
import torch.optim as optim

# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()

# specify optimizer
optimizer = optim.SGD(model.parameters(), lr=0.005)

################### Network training ###################
# number of epochs to train the model
n_epochs = 300 #5000
print('Starting training!')
valid_loss_min = np.Inf # track change in validation loss
for epoch in range(1, n_epochs+1):
    # keep track of training and validation loss
    train_loss = 0.0
    valid_loss = 0.0
    loadPoint(weigths_path, model, optimizer, epoch, valid_loss_min)
    ###################
    # train the model #
    ###################
    model.train()
    for data, target in train_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update training loss
        train_loss += loss.item()*data.size(0)
        
    ######################    
    # validate the model #
    ######################

    model.eval()
    for data, target in valid_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # update average validation loss 
        valid_loss += loss.item()*data.size(0)
    
    # calculate average losses
    train_loss = train_loss/len(train_loader.sampler)
    valid_loss = valid_loss/len(valid_loader.sampler)
        
    # print training/validation statistics 
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
        epoch, train_loss, valid_loss))

    #checkpoint
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        valid_loss_min,
        valid_loss))
        savePoint(weigths_path, model, optimizer, epoch, valid_loss_min)
        valid_loss_min = valid_loss

# Loading the model with the lowest validation loss
#loadPoint(weigths_path, model, optimizer, epoch, valid_loss_min)
#model.load_state_dict(torch.load('weights.pt'))

################### Neural network testing with UNSEEN data ###################

# track test loss
test_loss = 0.0
class_correct = list(0. for i in range(2))
class_total = list(0. for i in range(2))

model.eval()
# iterate over test data
for data, target in test_loader:
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
        data, target = data.cuda(), target.cuda()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the batch loss
    loss = criterion(output, target)
    # update test loss 
    test_loss += loss.item()*data.size(0)
    # convert output probabilities to predicted class
    _, pred = torch.max(output, 1)    
    # compare predictions to true label
    correct_tensor = pred.eq(target.data.view_as(pred))
    correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
    # calculate test accuracy for each object class
    
    for i in range(len(target.data)):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1
        
# average test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(2):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            classes[i], 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))

My problem is that when i am testing results, i get only 61% percent of recognition, which is pretty low for me. And my neural network do not learn anything new in about ~100 epochs, so it takes a lot of time…
For now my best result in training loss is 0.499814 and valid loss is 0.506415
image
What should i change to get better result and improve it?

I have already tried to change Dropout and learning rate, but nothing really changed a lot :c
Maybe i am doing something wrong and some of you can give me advice or point out an error?
If i am right, it is “dying ReLu” problem, but i don’t know how to fix it :sweat_smile:

p.s. sorry for bad english, i hope i described my problem pretty understandably.

Hi.
I would go for Binary cross entroype loss instead.
Besides, which kind of modifications are done in the images to be considered as “modified”.

If the examples are very hard the network may not learn. You can try to go from simple examples to more challenging ones. The accuracy you get basically means the results are random, thus, the network is doing nothing. In addition, you may want to increse the amount of parameters. This depends on what kind of images you have.

I have modified images with Noise, Sharpness, Color Correction and Contrast. All of them were modified by different parameters in photoshop… I am not sure that it gives a lot of info, so maybe i can show some examples…? or something other?

I would start by simple examples checking network’s performance. Kinda step by step.
If it can identify very noisy images then reduce the noise and so. Then try for the gamma from very unrealistic images to realistic ones.
Train one network per task. Modeling noise is totally different from modeling contrast ie.
If you manage to make everything to work then try to train everything independently then try a single network.

Some ideas:
Dropout is required when your network overfits, which doesn’t seem to be the case.
You need non-linearities (activation functions–> relu)
You may want to start with already-designed networks. If you find anything that works for MNIST dataset, should be adequate to yours.

Also be sure about the correctness of your dataset. jpg images are noisy due to the compression. You have to be sure that the splits are perceptually ok.

1 Like

Thanks for your answer! I will try to change something following your advices!

Also waiting for other people comments :slight_smile: