CNN results negative when using log_softmax and nll loss

spacemeerkat · April 23, 2018, 2:36pm

Hi all, I’m using the nll_loss function in conjunction with log_softmax as advised in the documentation when creating a CNN. However, when I test new images, I get negative numbers rather than 0-1 limited results. This is really strange given the bound nature of the softmax function and I was wondering if anyone has encountered this problem or can see where I’m going wrong?


import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision

import matplotlib.pyplot as plt
import numpy as np

transform = transforms.Compose(
                   [transforms.Resize((32,32)),
                    transforms.ToTensor(),
                    ])

trainset = dset.ImageFolder(root="Image_data",transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=10,shuffle=True)

testset = dset.ImageFolder(root='tests',transform=transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=10,shuffle=True)

classes=('Cats','Dogs')

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

dataiter = iter(train_loader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
plt.show()

dataiter = iter(test_loader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
plt.show()

class Net(nn.Module):
    
    def __init__(self):
        super(Net,self).__init__()
        self.conv1  = nn.Conv2d(3,32,5,padding=2) # 1 input, 32 out, filter size = 5x5, 2 block outer padding
        self.conv2  = nn.Conv2d(32,64,5,padding=2) # 32 input, 64 out,  filter size = 5x5, 2 block padding
        self.fc1    = nn.Linear(64*8*8,1024) # Fully connected layer 
        self.fc2    = nn.Linear(1024,2) #Fully connected layer 2 out.
        
    def forward(self,x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2) # Max pool over convolution with 2x2 pooling 
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) # Max pool over convolution with 2x2 pooling 
        x = x.view(-1,64*8*8) # tensor.view() reshapes the tensor
        x = F.relu(self.fc1(x)) # Activation function after passing through fully connected layer
        x = F.dropout(x, training=True) #Dropout regularisation
        x = self.fc2(x) # Pass through final fully connected layer
        return F.log_softmax(x,dim=1) # Give results using softmax
  
model = Net()
print(model) 

optimizer = optim.Adam(model.parameters(), lr=0.0001)

model.train()
train_loss = []
train_accu = []
i = 0
batch_size = 10
for epoch in range(10):
    for data, target in train_loader:
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target) # Negative log likelihood (goes with softmax). 
        loss.backward()    # calc gradients
        train_loss.append(loss.data[0]) # Calculating the loss
        optimizer.step()   # update gradients
        prediction = output.data.max(1)[1]   # first column has actual prob.
        accuracy = (prediction.eq(target.data).sum()/batch_size)*100
        train_accu.append(accuracy)
        if i % 10 == 0:
            print('Epoch:',str(epoch),'Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
        i += 1

from PIL import Image

loader = transform

def image_loader(image_name):
    image = Image.open(image_name)
    image = loader(image).float()
    image = Variable(image, requires_grad=True)
    image = image.unsqueeze(0)  #this is for VGG, may not be needed for ResNet
    return image

image = image_loader('cat_test.jpg')
image2 = image_loader('dog_test.jpg')


prediction = model(image)
print(prediction)

prediction = model(image2)
print(prediction)

Many thanks in advance!

ptrblck · April 23, 2018, 2:41pm

Since you are using the logarithm on softmax, you will get numbers in [-inf, 0], since log(0)=-inf and log(1)=0.
You could get the probabilities back by using torch.exp(output).

spacemeerkat · April 23, 2018, 3:05pm

Thanks ptrblck that worked perfectly. Out of curiosity, for a case like this, is there any reason why pushing the output result to one value rather than using two (like in one-hot encoding and above) wouldn’t work?

ptrblck · April 23, 2018, 3:18pm

The optimizer tried to minimize the loss.
It doesn’t depend on using one-hot encoding or an index target. The loss will be the same.
The one-hot targets will be used as an index, so that we can save this transformation and directly feed the index vector.

What do you exactly mean by “pushing the output result to one value rather than using two” ?

spacemeerkat · April 23, 2018, 3:38pm

Okay I think I understand, so it doesn’t matter whether you use one-hot encoding or indexing because the optimizer will simply minimise the loss between the target and resulting class regardless of formatting of class labels because it will be the same format for both input and output?

My question was really trying to understand whether I have my thoughts right on the following:

I’m trying to classify images as either ‘cat’ or 'dog. Therefore in my mind, if I return two outputs of the final fully connected layer, this is essentially one-hot encoding where one of the two values represents ‘dog’ and the other ‘cat’.

If I used one output instead of two in that final fully connected layer, I would expect it to push the ‘dog’ class to say 0 and ‘cat’ to 1 or vice versa.

Maybe this thinking is completely wrong though!

ptrblck · April 23, 2018, 3:44pm

Ah ok, I think I misunderstood your question!

As a side note, PyTorch needs the class indices for multi-class classification for the target tensor. You cannot use one-hot encoded targets. The model output will have dimension [batch_size, number_of_classes] though.

In you use case (2 class classification) you can use either of the two approaches.
You could use a linear layer with 2 output units, apply log_softmax on them and use NLLLoss or you could use a linear layer with only 1 output, apply sigmoid on it and use BCELoss.

I’m not sure if one approach is superior to the other, since I see a lot of both methods.

Your ideas was right!

spacemeerkat · April 23, 2018, 4:05pm

Okay, just so I understand that right, you can’t pass PyTorch say [1,0] for a cat label and [0,1] for a dog label for all the training images?

Not sure I understand how you specify the dimensions [batch_size, number_of_classes] during the model building when I’ve already specified the number of outputs as two in this case but I think I get that what you’re saying is your output will be say 10 by 2 if you have 10 training images in a batch, but when you test a single image it will come out at 1 by 2?

Thanks by the way this is really helpful

ptrblck · April 23, 2018, 4:24pm

Yes, you are correct.
You would have to use something like this as your target (for 2 output units):

batch_size = 10
nb_classes = 2
target = torch.LongTensor(batch_size).random_(nb_classes)

Yes you are also right about the sizes. Sometimes to need to unsqueeze the target in case the batch dimension is missing.

spacemeerkat · April 23, 2018, 4:31pm

Okay I’ll have a play around and see what I can do, I think my understanding of some of the basics like defining the output in the form that I want among other things is quite limited but I understand what you’ve been saying!

spacemeerkat · April 25, 2018, 9:37am

Hi ptrblck,

I don’t know if you’re still following this thread but I’ve been reading around and trying to understand how to make this CNN work as expect it to but I’ve just made myself more confused.

Do you have any advice on why my code would not converge to a higher accuracy? As, when I test the code above, my accuracy throughout training is very stochastic rather than improving over time as one would expect.

accuracy

I suspect that maybe this has something to do with how you store the images into a particular file structure but then assign labels to these folders, or maybe my code has a blaring mistake in training?

Many thanks in advance!

ptrblck · April 25, 2018, 12:57pm

Is the accuracy for the training set?
How many images do you have?
It seems the number of images is quite low, since the accuracy makes such jumps between iterations.

Also, you are fixing dropout to training mode:

x = F.dropout(x, training=True)

During evaluation dropout will still be active.
Change it to:

x = F.dropout(x, training=self.training)

to deactivate it during evaluation.

spacemeerkat · April 25, 2018, 1:12pm

Yes this is for the training accuracy using:

ax.plot(np.arange(len(train_accu)), train_accu)

I’ve altered the dropout regularisation using the code you provided above.

I’m not actually loading in any labels for the images, I was assuming that the way the Dataloader worked was that if you organised images of say Cats into a folder called ‘Cats’ and the same for say ‘Dogs’ under ‘Image_data’ then it would assign the labels automatically. eg:

trainset = dset.ImageFolder(root="Image_data",transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=10,shuffle=True)

Maybe you have to pre-process the images to have associated labels or a .csv containing image names and labels?

ptrblck · April 25, 2018, 1:21pm

The ImageFolder class creates the labels for you (line of code).
As long as your images are in the appropriate folder the labels should be alright.

How many images do you have?

spacemeerkat · April 25, 2018, 1:24pm

Okay that’s good news then!

Sorry I forgot to say, I have ~119 training images with 18 images for testing and 2 standalone images which I use for testing the model after completing training.

I realise that’s not many images but I’m only doing this as a test before using a much larger training set for my work.

ptrblck · April 25, 2018, 1:31pm

The training accuracy looks strange for ~119 images.
Could you print np.unique(train_accu) and see how many “levels” there are.
It looks like only 4 levels in the plot.
Your model should overfit pretty easily, so I’m wondering why the accuracy is 0 sometimes.

spacemeerkat · April 25, 2018, 1:39pm

I think that’s due to the batch sizing, as I recently set that to 4, apologies I didn’t update the code above I’ll do that here.

If there are only 4 images to classify per epoch it would make sense that the only available accuracy numbers are 0, 25, 50, 75, and 100.

print(np.unique(train_accu))
[ 25.  50.  75. 100.]

And here is the updated code:


import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision

import matplotlib.pyplot as plt
import numpy as np

batch_size = 4

transform = transforms.Compose(
                   [transforms.Resize((32,32)),
                    transforms.ToTensor(),
                    ])

trainset = dset.ImageFolder(root="Image_data",transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,shuffle=True)

testset = dset.ImageFolder(root='tests',transform=transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size,shuffle=False)

classes=('Cats','Dogs')

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

dataiter = iter(train_loader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
plt.show()

dataiter = iter(test_loader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
plt.show()

class Net(nn.Module):
    
    def __init__(self):
        super(Net,self).__init__()
        self.conv1  = nn.Conv2d(3,32,5,padding=2) # 1 input, 32 out, filter size = 5x5, 2 block outer padding
        self.conv2  = nn.Conv2d(32,64,5,padding=2) # 32 input, 64 out,  filter size = 5x5, 2 block padding
        self.fc1    = nn.Linear(64*8*8,1024) # Fully connected layer 
        self.fc2    = nn.Linear(1024,4) #Fully connected layer 2 out.
        
    def forward(self,x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2) # Max pool over convolution with 2x2 pooling 
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) # Max pool over convolution with 2x2 pooling 
        x = x.view(-1,64*8*8) # tensor.view() reshapes the tensor
        x = F.relu(self.fc1(x)) # Activation function after passing through fully connected layer
        x = F.dropout(x, training= self.training) #Dropout regularisation
        x = self.fc2(x) # Pass through final fully connected layer
        output= F.log_softmax(x,dim=1) # Give results using softmax
        return torch.exp(output)
  
model = Net()
print(model) 

optimizer = optim.Adam(model.parameters(), lr=0.01)

model.train()
train_loss = []
train_accu = []
i = 0
for epoch in range(1):
    for data, target in train_loader:
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target) # Negative log likelihood (goes with softmax). 
        loss.backward()    # calc gradients
        train_loss.append(loss.data[0]) # Calculating the loss
        optimizer.step()   # update gradients
        prediction = output.data.max(1)[1]   # first column has actual prob.
        accuracy = (prediction.eq(target.data).sum()/batch_size)*100
        train_accu.append(accuracy)
        if i % 10 == 0:
            print('Epoch:',str(epoch),'Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
        i += 1

plt.figure()
plt.plot(np.arange(len(train_loss)),train_loss)
plt.show()

plt.figure()
plt.plot(np.arange(len(train_accu)), train_accu)
plt.show()

from PIL import Image

loader = transform

def image_loader(image_name):
    image = Image.open(image_name)
    image = loader(image).float()
    image = Variable(image, requires_grad=True)
    image = image.unsqueeze(0)  #this is for VGG, may not be needed for ResNet
    return image

image = image_loader('test_data/Cat/cat_test.jpg')
image2 = image_loader('test_data/Cat/dog_test.jpg')

prediction = model(image)
print(prediction)

prediction = model(image2)
print(prediction)

ptrblck · April 25, 2018, 1:52pm

Yes, that makes totally sense!
Try to lower your learning rate and run it again. The model should reach 100% accuracy pretty fast.

spacemeerkat · April 25, 2018, 2:03pm

Okay I gave that a go, it still seems to be having trouble though:

accuracy

print(np.unique(train_accu))
[  0.  25.  50.  75. 100.]

I don’t know if it helps but the outputs when testing a cat and a dog image after training gives these results:

Cat test:

Variable containing:
 1.0000e+00  3.0693e-09
[torch.FloatTensor of size 1x2]

Dog test:

Variable containing:
 1.0000e+00  1.5074e-07
[torch.FloatTensor of size 1x2]

ptrblck · April 25, 2018, 2:16pm

I think your model might output just one class (class0) and based on the class distribution in the current batch, you will get these results.

One mistake I just realized:
Remove the torch.exp in the return statement from the model.
NLLLoss expects log_softmax. Remove it and try it again.

spacemeerkat · April 25, 2018, 2:26pm

So the model is incapable of outputting [0,1] at any point because of the way I’ve defined the output? Okay that’s been removed and I reran the code.

I’m still getting some strange accuracy values

accuracy

But now that I’ve removed the torch.exp clause, I’m using it when evaluating the last two test images:

prediction = torch.exp(model(image))
print(prediction)

Variable containing:
 0.6477  0.3523
[torch.FloatTensor of size 1x2]

prediction = torch.exp(model(image2))
print(prediction)

Variable containing:
 0.6543  0.3457
[torch.FloatTensor of size 1x2]

I assume that’s okay to do as it’s not in the model itself?

By the way I thought I’d let you know, this is how my images are set up:

~Data:
   ¬ Training
      -Cats 159 images
      -Dogs 60 images
   ¬ Testing
      -Cats 9 images
      -Dogs 9 images
   ¬ Post-training
      - 1 cat image, 1 dog image