Need help with my first CNN

Hi, I’m trying to do an image classification task with my first neural network. I have minimal practical experience. I have about 650 8-bit gray value images of dimensions 21x21x21 that I want to put into two classes. Starting from one of the official tutorials here, after some fiddling I now have this code:

import os
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import helpers

batch_size = 4
rng = np.random.RandomState(128)

#=========================================================================
# Load data
#=========================================================================
# truth/target values
path = "/NN_ROI_classification_data/"
truth = pd.read_csv(os.path.join(path, 'assessments.csv'))
# images
temp = []
for img_name in truth.Number:
    image_path = os.path.join(path, 'images', str(img_name))
    img = helpers.read_image_sequence(image_path)
    img = img.astype('float32')
    temp.append(img)
imgs = np.stack(temp)
imgs /= 255.0

truth = truth.Type.values

# Split data into a training and a validation set
training_no = 350
train_imgs, val_imgs = imgs[:training_no], imgs[training_no:]
train_truth, val_truth = truth[:training_no], truth[training_no:]


#=========================================================================
# Define net
#=========================================================================
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(21, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 2 * 2, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 1)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 2 * 2)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


def batch_creator(batch_size):
    dataset_length = train_imgs.shape[0]

    batch_mask = rng.choice(dataset_length, batch_size)

    batch_imgs = train_imgs[batch_mask]
    batch_truth = train_truth[batch_mask]

    return batch_imgs, batch_truth


#=========================================================================
# Train net
#=========================================================================
total_batch = int(train_truth.shape[0] / batch_size)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i in range(total_batch):
        # get the inputs
        img_batch, truth_batch = batch_creator(batch_size)
        imgs = Variable(torch.from_numpy(img_batch))
        truth = Variable(torch.from_numpy(truth_batch),
                         requires_grad=False)  # @UndefinedVariable

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(imgs)
        loss = criterion(outputs, truth)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

The line where the loss is computed generates this error:

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at /pytorch/aten/src/THNN/generic/ClassNLLCriterion.c:97

At that point, this is what truth looks like:

tensor([ 1,  1,  0,  1])

And this is outputs:

tensor(1.00000e-02 *
       [[-2.2640],
        [-3.1873],
        [-2.1566],
        [-2.4766]])

Your last linear layer (fc3) should have 2 output units, since you are using CrossEntropyLoss.
Just change it to self.fc3 = nn.Linear(84, 2) and run it again.

Thanks, that did it! Well, it runs, but it doesn’t work well:

...
[2,    80] loss: 0.474
[2,    81] loss: 0.951
[2,    82] loss: 0.837
[2,    83] loss: 0.476
[2,    84] loss: 0.828
[2,    85] loss: 0.725
[2,    86] loss: 0.717
[2,    87] loss: 0.830
Finished Training

My reasoning was, since I have two classes and truth values of 0 or 1 for each image, the net should have one output that is either closer to 0 or 1. Now that the net’s output is a vector with two entries, do the truth values also have to have two entries per image? I.e. (0,1) or (1,0)?

Since, you have only two classes, better to use binary cross entropy as the loss function and change your last layer to self.fc3 = nn.Linear(84, 1) as it was initially.

BCE loss in pytorch -> torch.nn.BCELoss()

Thanks for the tip! BCELoss() awaits an input between 0 and 1, while my net outputs have entries like -5. Now my first thought was to normalize each output vector to [0,1] but on second thought, that’s a bad idea, right? Wouldn’t that change the overall loss values in a non linear way? If yes, what can I do instead?

Using BCELoss as @Jk749 suggested is another valid approach.
Just use one output unit, add a sigmoid at your last layer, and try BCELoss.

...
x = F.sigmoid(self.fc3(x))
return x

Ah, that was easy, thanks. It does little to improve the performance though:

...
Truth:  tensor([ 1.,  0.,  0.,  0.])
Output:  tensor([[ 0.5666],
        [ 0.5650],
        [ 0.5653],
        [ 0.5655]])
[2,    85] loss: 0.703
Truth:  tensor([ 1.,  0.,  0.,  1.])
Output:  tensor([[ 0.5657],
        [ 0.5688],
        [ 0.5657],
        [ 0.5668]])
[2,    86] loss: 0.702
Truth:  tensor([ 0.,  1.,  1.,  0.])
Output:  tensor([[ 0.5679],
        [ 0.5672],
        [ 0.5660],
        [ 0.5657]])
[2,    87] loss: 0.768
Truth:  tensor([ 1.,  0.,  0.,  0.])
Output:  tensor([[ 0.5679],
        [ 0.5659],
        [ 0.5667],
        [ 0.5656]])
Finished Training

Hmm. In the tutorial where I started from, 3-channel (RGB) images are categorized into ten classes. Now I have 21 channels and two classes. I guess the net architecture must be changed further…?
I’ve trained an SVM with this same data and achieved about 95% accuracy.

Have you trained the SVM on the images directly?
You could play around with the learning rate, optimizers etc.
What kind of data do you have?

WIll do!
Each image is a 21x21x21px 3D ROI of an 8 bit grey value phase microscopy image and the classifier should decide whether there’s a (one) cell present or not. I used the SVM from scikit learn and trained on the images directly, in the unraveled shape of 1x9261.