Binary classification with CNN from scratch

I’ve just changed from Keras to Pytorch, and I have tried to follow some tutorials. And most of it makes sense. But all the tutorials I could find are on multiclass problems like mnist, cifar-10 or transfer learning. But today I want to try the good old dog vs. cat problem from scratch. Last time I worked with Keras on this specific problem, I got an acc>90%, but when I am trying in Pytorch, it gives me an accuracy of 50 - if i am lucky. So assume that I’m doing something wrong :smile:
I have all the images saved in the folder ‘Cat_Dog_data’ folder with subfolders test and train. Which each contains the folder cat and dog. I have tried to make the network simpler, remove dropout, shuffle the dataset, change the loss, changed the way I measure the loss/accuracy, etc. - but it doesn’t help :frowning:

data_dir = 'Cat_Dog_data'
transforms = transforms.Compose([transforms.Resize(128),
train_data = datasets.ImageFolder(data_dir + '/train', transform = transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=  transforms)

train_loader =, batch_size=32, shuffle=True)
test_loader =, batch_size=32, shuffle=True)

# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
   def __init__(self):
       super(Net, self).__init__()
       self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
       self.conv2 = nn.Conv2d(32, 32, 3, padding=1)
       self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
       self.pool = nn.MaxPool2d(2, 2)
       self.fc1 = nn.Linear(64*4*4*4*4, 64)
       self.fc2 = nn.Linear(64, 1)
       self.dropout = nn.Dropout(0.1)

   def forward(self, x):
       # add sequence of convolutional and max pooling layers
       x = self.pool(F.relu(self.conv1(x)))
       x = self.pool(F.relu(self.conv2(x)))
       x = self.pool(F.relu(self.conv3(x)))
       x = x.view(-1, 64 * 4 * 4*4*4)
       x = self.dropout(x)
       x = F.relu(self.fc1(x))
       x = self.dropout(x)
       x = F.relu(self.fc2(x))
       x = F.sigmoid(x)
       return x

model = Net()

# move tensors to GPU if CUDA is available
if train_on_gpu:

def weight_init(m):
   if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
       nn.init.xavier_uniform_(m.weight, gain=nn.init.calculate_gain('relu'))


n_epochs = 5 # you may increase this number to train a final model
valid_loss_min = np.Inf # track change in validation loss
from torch.autograd import Variable
for epoch in range(1, n_epochs+1):

   # keep track of training and validation loss
   train_loss = 0.0
   valid_loss = 0.0
   accuracy = 0.0
   # train the model #
   for data, target in train_loader:
       # move tensors to GPU if CUDA is available
       target = target.float()

       if train_on_gpu:
           data, target = data.cuda(), target.cuda()
       # clear the gradients of all optimized variables
       # forward pass: compute predicted outputs by passing inputs to the model
       output = model(data)
       # calculate the batch loss
       loss = criterion(output, target)
       # backward pass: compute gradient of the loss with respect to model parameters
       # perform a single optimization step (parameter update)
       # update training loss
       train_loss += loss.item()*data.size(0)
      # print(((output.squeeze() > 0.5) == target.byte()).sum().item() / target.shape[0])

   # validate the model #

   for data, target in test_loader:
       # move tensors to GPU if CUDA is available
       if train_on_gpu:
           data, target = data.cuda(), target.cuda()
       # forward pass: compute predicted outputs by passing inputs to the model
       target = target.float()
       output = model(data)
       # calculate the batch loss
       loss = criterion(output, target.view_as(output))
       # update average validation loss 
       valid_loss += loss.item()*data.size(0)
       t = Variable(torch.FloatTensor([0.5]))  # threshold
       out = (output > t.cuda(async=True)).float() * 1
      # print(out)
       equals = target.float()  ==  out.t()
      # print(equals)
       accuracy += (torch.sum(equals).cpu().numpy())
      # print(equals)
      # print(target)
   # calculate average losses
   train_loss = train_loss/len(train_loader.dataset)
   valid_loss = valid_loss/len(test_loader.dataset)
   accuracy = accuracy/len(test_loader.dataset)
   # print training/validation statistics 
   print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}  \tAccuracy: {:.6f}  '  .format(
       epoch, train_loss, valid_loss, accuracy))
   # save model if validation loss has decreased
   if valid_loss <= valid_loss_min:
       print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
       valid_loss)), '')
       valid_loss_min = valid_loss

Skimming through your code I couldn’t find anything obviously wrong.
I assume you are using nn.BCELoss as the criterion.
Which optimizer are you using and how did you set the hyperparamters (learning rate etc.)?
Also, why do you have to reshape the target in your validation loop, while the training loop seems to be alright?
Could there be some kind of shape mismatch?

As a small side note: Variables are deprecated since Pytorch 0.4.0. You don’t have to wrap tensors anymore into it. If you are using an older version, you’ll find the install instructions here.

Hello, I am very new to PyTorch. I have a question about using CNN in image classification. If we have images with multiple classes (I mean each image belongs to multiple classes and its class label is not only one value like 0 or 1, and it has a class label vector like [0 1 1] ). Then, how is it possible to classify that image with CNN in PyTorch? I really appreciate it if you could help me. Thank you.

You can use nn.BCEWithLogitsLoss as the criterion and return logits from the model in the shape [batch_size, nb_classes]. The target tensor would be a multi-hot encoded tensor in the same shape ([batch_size, nb_classes]) containing values in [0, 1] where a 1 would indicate the class is “active”.

For example in keras, the MLP model for multi-label images can be defined as:

(n_outputs is equal to the number of labels for each image)

def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(20, input_dim=n_inputs, kernel_initializer=‘he_uniform’, activation=‘relu’))
model.add(Dense(n_outputs, activation=‘sigmoid’))
model.compile(loss=‘binary_crossentropy’, optimizer=‘adam’)
return model

and after prediction a new image: model.predict(new_image)
we can get the probabilities of each class label like: [0.9, 0.7, 0.2]

Now, I want to ask how can I change MLP or CNN in pyTorch for such images with multiple labels, where should I use nn.BCEWithLogitsLoss?

super(MLP, self).init()
self.layer_input = nn.Linear(n_input, n_hidden)
self.relu = nn.ReLU()
self.dropout = nn.Dropout()
self.layer_hidden = nn.Linear(n_hidden, n_out)

def forward(self, x):
    x = x.view(-1, x.shape[1]*x.shape[-2]*x.shape[-1])
    x = self.layer_input(x)
    x = self.dropout(x)
    x = self.relu(x)
    x = self.layer_hidden(x)
    return x

or for CNN:
super(CNN, self).init()
self.conv1 = nn.Conv2d(10, kernel_size=6)
self.conv2 = nn.Conv2d(10, 20, kernel_size=6)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(20, 50)
self.fc2 = nn.Linear(50, num_classes(for binary classification for example))

def forward(self, x):
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
    x = x.view(-1, x.shape[1]*x.shape[2]*x.shape[3])
    x = F.relu(self.fc1(x))
    x = F.dropout(x,
    x = self.fc2(x)
    return x

Your code looks alright and you should be able to train these models.
To get the probabilities of the outputs, use:

output = model(input)
preds = torch.sigmoid(output)

You should use it as the loss function:

criterion = nn.BCEWithLogitsLoss()

output = model(input)
preds = torch.sigmoid(output)

loss = criterion(output, target)

Make sure to pass the logits to this loss function, not the probabilities.