Need help with my first CNN

speedymcs · June 13, 2018, 5:17pm

Hi, I’m trying to do an image classification task with my first neural network. I have minimal practical experience. I have about 650 8-bit gray value images of dimensions 21x21x21 that I want to put into two classes. Starting from one of the official tutorials here, after some fiddling I now have this code:

import os
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import helpers

batch_size = 4
rng = np.random.RandomState(128)

#=========================================================================
# Load data
#=========================================================================
# truth/target values
path = "/NN_ROI_classification_data/"
truth = pd.read_csv(os.path.join(path, 'assessments.csv'))
# images
temp = []
for img_name in truth.Number:
    image_path = os.path.join(path, 'images', str(img_name))
    img = helpers.read_image_sequence(image_path)
    img = img.astype('float32')
    temp.append(img)
imgs = np.stack(temp)
imgs /= 255.0

truth = truth.Type.values

# Split data into a training and a validation set
training_no = 350
train_imgs, val_imgs = imgs[:training_no], imgs[training_no:]
train_truth, val_truth = truth[:training_no], truth[training_no:]


#=========================================================================
# Define net
#=========================================================================
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(21, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 2 * 2, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 1)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 2 * 2)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


def batch_creator(batch_size):
    dataset_length = train_imgs.shape[0]

    batch_mask = rng.choice(dataset_length, batch_size)

    batch_imgs = train_imgs[batch_mask]
    batch_truth = train_truth[batch_mask]

    return batch_imgs, batch_truth


#=========================================================================
# Train net
#=========================================================================
total_batch = int(train_truth.shape[0] / batch_size)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i in range(total_batch):
        # get the inputs
        img_batch, truth_batch = batch_creator(batch_size)
        imgs = Variable(torch.from_numpy(img_batch))
        truth = Variable(torch.from_numpy(truth_batch),
                         requires_grad=False)  # @UndefinedVariable

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(imgs)
        loss = criterion(outputs, truth)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

The line where the loss is computed generates this error:

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at /pytorch/aten/src/THNN/generic/ClassNLLCriterion.c:97

At that point, this is what truth looks like:

tensor([ 1,  1,  0,  1])

And this is outputs:

tensor(1.00000e-02 *
       [[-2.2640],
        [-3.1873],
        [-2.1566],
        [-2.4766]])

ptrblck · June 13, 2018, 8:54pm

Your last linear layer (fc3) should have 2 output units, since you are using CrossEntropyLoss.
Just change it to self.fc3 = nn.Linear(84, 2) and run it again.

speedymcs · June 14, 2018, 8:52am

Thanks, that did it! Well, it runs, but it doesn’t work well:

...
[2,    80] loss: 0.474
[2,    81] loss: 0.951
[2,    82] loss: 0.837
[2,    83] loss: 0.476
[2,    84] loss: 0.828
[2,    85] loss: 0.725
[2,    86] loss: 0.717
[2,    87] loss: 0.830
Finished Training

My reasoning was, since I have two classes and truth values of 0 or 1 for each image, the net should have one output that is either closer to 0 or 1. Now that the net’s output is a vector with two entries, do the truth values also have to have two entries per image? I.e. (0,1) or (1,0)?

Jk749 · June 14, 2018, 10:13am

Since, you have only two classes, better to use binary cross entropy as the loss function and change your last layer to self.fc3 = nn.Linear(84, 1) as it was initially.

BCE loss in pytorch → torch.nn.BCELoss()

speedymcs · June 14, 2018, 11:54am

Thanks for the tip! BCELoss() awaits an input between 0 and 1, while my net outputs have entries like -5. Now my first thought was to normalize each output vector to [0,1] but on second thought, that’s a bad idea, right? Wouldn’t that change the overall loss values in a non linear way? If yes, what can I do instead?

ptrblck · June 14, 2018, 12:05pm

Using BCELoss as @Jk749 suggested is another valid approach.
Just use one output unit, add a sigmoid at your last layer, and try BCELoss.

...
x = F.sigmoid(self.fc3(x))
return x

speedymcs · June 14, 2018, 12:54pm

Ah, that was easy, thanks. It does little to improve the performance though:

...
Truth:  tensor([ 1.,  0.,  0.,  0.])
Output:  tensor([[ 0.5666],
        [ 0.5650],
        [ 0.5653],
        [ 0.5655]])
[2,    85] loss: 0.703
Truth:  tensor([ 1.,  0.,  0.,  1.])
Output:  tensor([[ 0.5657],
        [ 0.5688],
        [ 0.5657],
        [ 0.5668]])
[2,    86] loss: 0.702
Truth:  tensor([ 0.,  1.,  1.,  0.])
Output:  tensor([[ 0.5679],
        [ 0.5672],
        [ 0.5660],
        [ 0.5657]])
[2,    87] loss: 0.768
Truth:  tensor([ 1.,  0.,  0.,  0.])
Output:  tensor([[ 0.5679],
        [ 0.5659],
        [ 0.5667],
        [ 0.5656]])
Finished Training

Hmm. In the tutorial where I started from, 3-channel (RGB) images are categorized into ten classes. Now I have 21 channels and two classes. I guess the net architecture must be changed further…?
I’ve trained an SVM with this same data and achieved about 95% accuracy.

ptrblck · June 14, 2018, 1:07pm

Have you trained the SVM on the images directly?
You could play around with the learning rate, optimizers etc.
What kind of data do you have?

speedymcs · June 14, 2018, 1:12pm

WIll do!
Each image is a 21x21x21px 3D ROI of an 8 bit grey value phase microscopy image and the classifier should decide whether there’s a (one) cell present or not. I used the SVM from scikit learn and trained on the images directly, in the unraveled shape of 1x9261.