Poor performance comparing to Matlab in simple classification

Kyle_Wang · December 13, 2021, 8:39am

I’m experimenting a very simple image classification task: classify chest x-ray to either normal, bacterial infected or virus infected.

Please see my github: GitHub - kail85/SimpleTorchClassification

My training script is in Train.ipynb. The data are simple gray scale image of the size 256 x 256, and being spitted to train, val and test sets. In each set, the ratio of the 3 image categories remains the same.

I wrote the script in both torch and Matlab. Using mobilenetv2, Matlab stops at epoch 9 with a final validation loss 0.1714 and accuracy 82.91%.

While for pytorch with the exact same training setting, the validation loss never drops below 1.0 and there’s a severe overfitting.

I’m not sure what happened, is there anything wrong with my validation script? I tried mobilenetv3_small, mobilenetv3_large, and resnet18, they all have the same issue.

Only alexnet can reach a validation loss at about 0.4 at epoch 20.

I’ve been stuck here for a few days. I’m appreciated any suggestion. Thank you.

Update: I put a break point in the validation function, at the line

outputs_val = net(inputs_val)

its output is

tensor([[ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],...

Then I tried

net(torch.zeros_like(inputs_val))

it gives

tensor([[ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],
        [ 1.1345, -1.1818,  0.0358],...

I’m about to cry…

MartinZhang · December 13, 2021, 9:14am

I am not sure, how much learning range?