Resnet for binary classification

fermat97 · December 18, 2018, 2:07pm

I have modified a resnet18 network as follows:

model = torchvision.models.resnet18()
model.conv1 = nn.Conv2d(num_input_channel, 64, kernel_size=7, stride=2, padding=3,bias=False)
model.avgpool = nn.AdaptiveAvgPool2d(1)
model.fc = nn.Linear(512 * torchvision.models.resnet.BasicBlock.expansion,2)

and I use nn.CrossEntropyLoss() as the loss function and I provide the labels just as class numbers (0 or 1), but the performance is very poor (worse than a dummy classifier). I would like to make sure if the resnet modification is correct for binary classification.

shaibagon · December 19, 2018, 1:48pm

Are you using pretrained model?
Are you modifying the net after loading the weights?
Have you trained the model after modifying it?
Why are you changing model.conv1?

fermat97 · December 19, 2018, 2:24pm

No i dont use pretrained models, so the training is from the scratch.
I have modified model.conv1 to have a single channel input.
I have trained the model with these modifications but the predicted labels are in favor of one of the classes, so it cannot go beyond 50% accuracy, and since my train and test data are balanced, the classifier actually does nothing.

fermat97 · December 19, 2018, 2:26pm

I have also seen this issue Binary Classification on Resnet? on pytorch/vision github, but its not clear to me the solution.

shaibagon · December 19, 2018, 3:47pm

how do you initialize the weights of the layers you added?
Do you normalize the inputs?
How many examples do you have?
Do you shuffle your training set?

fermat97 · December 19, 2018, 9:00pm

I didnt do any specific initialization just use the resnet18() and I think it handles the weigh initialization itself.
I didn’t do normalizations since my inputs are not actual images, they are matrices which are very very sparse with the size 784x162. Almost all the values are zero except a few of them which have real values, could that be a reason?
I have 2400 samples for training, it is probably very small for such networks, but the results are quite far away from what I have expected.
I have shuffled the data for training.

shaibagon · December 19, 2018, 9:18pm

Please verify that the weights of the layers you edited (conv1 and fc) are indeed initialized to non-zero values.
2,400 examples is too little for training a large model from scratch. Can you obtain more data?

fermat97 · December 19, 2018, 10:34pm

I have checked the values with model.fc.weightand model.conv1.weight and they are initialized to non-zero values.

I am asking for more data but the process of data generation is a bit expensive I think. The other problem is the sparseness of the matrices, do you think resnet works fine with sparse data?
Thank you

rasbt · December 20, 2018, 3:57am

The only modification you really need is in the linear layer which you have already done. So that should be fine. Maybe it’s an issue with your dataset?

I have implemented the ResNet-34 (50, 101, and 151) with some slight modifications from there and it works fine for binary classification. So, I don’t think it’s an issue with the architecture. I have an example here (for binary classification on gender labels, getting ~97% acc):

github.com

rasbt/deep-learning-book/blob/master/code/model_zoo/pytorch_ipynb/convnet-resnet34-celeba-dataparallel.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UEBilEjLj5wY"
   },
   "source": [
    "*Accompanying code examples of the book \"Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python\" by [Sebastian Raschka](https://sebastianraschka.com). All code examples are released under the [MIT license](https://github.com/rasbt/deep-learning-book/blob/master/LICENSE). If you find this content useful, please consider supporting the work by buying a [copy of the book](https://leanpub.com/ann-and-deeplearning).*\n",
    "  \n",
    "Other code examples and content are available on [GitHub](https://github.com/rasbt/deep-learning-book). The PDF and ebook versions of the book are available through [Leanpub](https://leanpub.com/ann-and-deeplearning)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "autoexec": {

This file has been truncated. show original

I suggest maybe trying your implementation on a different dataset where you know you should be getting good results to see if there’s maybe an implementation bug.

nthn_clmnt · December 20, 2018, 4:02am

You might need to put the resnet’s batch norm layers in to eval mode. This made a massive difference for me when using resnet as a feature extractor.

shaibagon · December 20, 2018, 5:51am

I don’t think using eval model for BatchNorm when training from scratch is a good idea.

nthn_clmnt · December 20, 2018, 12:10pm

No, certainly not. I missed the fact that you were training from scratch, my mistake.

fermat97 · December 20, 2018, 1:44pm

Yes I think the problem is the size of the data set. Thanks for your suggestion I ll try on another standard dataset.

rasbt · December 20, 2018, 11:57pm

Yes I think the problem is the size of the data set. Thanks for your suggestion I ll try on another standard dataset.

I just see above that you only have 2400 examples, which could be the main reason like you suggest.

Almost all values being 0 could be a problem, but it’s probably not the main reason. MNIST images also contain lots of 0’s. Another thing though is, besides the small dataset size, that 784x162 is very large for a convenet (typically, even for images, standard resnets for e.g,. face recognition operate on images between ~60x60 and ~200x200.

Since you are mentioning that these are not images, I wonder if it is a tabular dataset, in which case you might be better off using a network with only a few (e.g., 1-3) fully connected layers with dropout.

fermat97 · December 21, 2018, 1:25am

Thank you so much for your suggestions, yes my data is tabular with integer numbers.
And I have switched to work with 1 or 2 cnn layers followed by 1-3 fully connected layers. Thank you

wprins · December 21, 2018, 4:34am

Why do you say this? I am training a densenet from scratch and I check the validation accuracy by switching the model to .eval().

Densenet implementation is here and it has batch norm layers in it (which are also present during training I believe): https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py

shaibagon · December 21, 2018, 5:28am

This is not the issue here, clearly one must use “eval” mode for validation/testing.
The issue here is using “eval” for training as well: it is common practice to use “eval” when fine-tuning, but not for training from scratch

rasbt · December 27, 2018, 5:37pm

Yes, for some architectures it might not matter, but ResNet has BatchNorm layers, so it should be set to train() during training to parameterize these layers correctly, and eval() for testing to not update these on the test set.

shaibagon · December 27, 2018, 5:54pm

There are other layers besides Batchnorm that behave differently on train and eval; or instance, dropout layer or layers with spectral norm etc. Therefore it s a very good practice to always set the model to eval when testing and to train when training

rasbt · December 27, 2018, 6:00pm

Yes of course. BatchNorm was just an example specific to ResNet