and I use nn.CrossEntropyLoss() as the loss function and I provide the labels just as class numbers (0 or 1), but the performance is very poor (worse than a dummy classifier). I would like to make sure if the resnet modification is correct for binary classification.
Are you using pretrained model?
Are you modifying the net after loading the weights?
Have you trained the model after modifying it?
Why are you changing model.conv1?
No i dont use pretrained models, so the training is from the scratch.
I have modified model.conv1 to have a single channel input.
I have trained the model with these modifications but the predicted labels are in favor of one of the classes, so it cannot go beyond 50% accuracy, and since my train and test data are balanced, the classifier actually does nothing.
how do you initialize the weights of the layers you added?
Do you normalize the inputs?
How many examples do you have?
Do you shuffle your training set?
I didnt do any specific initialization just use the resnet18() and I think it handles the weigh initialization itself.
I didn’t do normalizations since my inputs are not actual images, they are matrices which are very very sparse with the size 784x162. Almost all the values are zero except a few of them which have real values, could that be a reason?
I have 2400 samples for training, it is probably very small for such networks, but the results are quite far away from what I have expected.
I have shuffled the data for training.
I have checked the values with model.fc.weightand model.conv1.weight and they are initialized to non-zero values.
I am asking for more data but the process of data generation is a bit expensive I think. The other problem is the sparseness of the matrices, do you think resnet works fine with sparse data?
Thank you
The only modification you really need is in the linear layer which you have already done. So that should be fine. Maybe it’s an issue with your dataset?
I have implemented the ResNet-34 (50, 101, and 151) with some slight modifications from there and it works fine for binary classification. So, I don’t think it’s an issue with the architecture. I have an example here (for binary classification on gender labels, getting ~97% acc):
I suggest maybe trying your implementation on a different dataset where you know you should be getting good results to see if there’s maybe an implementation bug.
You might need to put the resnet’s batch norm layers in to eval mode. This made a massive difference for me when using resnet as a feature extractor.
Yes I think the problem is the size of the data set. Thanks for your suggestion I ll try on another standard dataset.
I just see above that you only have 2400 examples, which could be the main reason like you suggest.
Almost all values being 0 could be a problem, but it’s probably not the main reason. MNIST images also contain lots of 0’s. Another thing though is, besides the small dataset size, that 784x162 is very large for a convenet (typically, even for images, standard resnets for e.g,. face recognition operate on images between ~60x60 and ~200x200.
Since you are mentioning that these are not images, I wonder if it is a tabular dataset, in which case you might be better off using a network with only a few (e.g., 1-3) fully connected layers with dropout.
Thank you so much for your suggestions, yes my data is tabular with integer numbers.
And I have switched to work with 1 or 2 cnn layers followed by 1-3 fully connected layers. Thank you
This is not the issue here, clearly one must use “eval” mode for validation/testing.
The issue here is using “eval” for training as well: it is common practice to use “eval” when fine-tuning, but not for training from scratch
Yes, for some architectures it might not matter, but ResNet has BatchNorm layers, so it should be set to train() during training to parameterize these layers correctly, and eval() for testing to not update these on the test set.
There are other layers besides Batchnorm that behave differently on train and eval; or instance, dropout layer or layers with spectral norm etc. Therefore it s a very good practice to always set the model to eval when testing and to train when training