Performance of SMOTE on CIFAR10 dataset

ARNAB_BANERJEE · February 21, 2021, 9:21pm

Hello, guys,

I was working on implementing SMOTE on the CIFAR10 dataset. I artificially introduced imbalance by making the first class a minority class(555 training examples) and kept all the training examples of other classes intact. I then applied SMOTE to the minority class to remove any imbalance. I was mainly concerned about the class accuracy of the minority class. After running the code, I got the following results:

Accuracy(Recall) of minority class(first class) when using the entire CIFAR10 dataset: 87.9%

Accuracy(Recall) of minority class(first class) when using the imbalanced CIFAR10 dataset: 50.9%

Accuracy(Recall) of minority class(first class) when using the SMOTE-balanced CIFAR10 dataset: 40.6%

The accuracy of other classes does not change much in all the above scenarios.

While training with the SMOTE-balanced dataset, I noticed that the validation accuracy of my validation set is very high (almost 1) whereas the test accuracy is as low as 40%. It indicates that the model must be overfitting this class.

I wonder if SMOTE is a good technique for the CIFAR dataset or any image dataset in general. Also, if someone else has tried SMOTE on CIFAR and got better results than just training with an imbalanced dataset please let me know how can I improve the class accuracies?