How to implement oversampling in Cifar-10?

ptrblck · May 2, 2018, 11:38pm

That’s good to hear! I was checking it myself, since I wanted to have a look at the accuracy.
Would you mind sharing the accuracies you have achieved using the balanced and imbalanced datasets?

Josiane_Rodrigues · May 2, 2018, 11:48pm

I have not finished training yet, but on a balanced dataset I got 93% accuracy and the unbalanced database it’s at 81%, but I’m still running the pre-training. When I finish training, I’ll put the final value here. Thanks again.

Josiane_Rodrigues · May 8, 2018, 4:49pm

Hi @ptrblck,

I achieved 84% accuracy in the final model with the unbalanced dataset. It’s a lot smaller than the value I got with the balanced dataset (93%), so I’m trying another oversampling strategy, however I’m having a problem. Since the function I am using to do oversampling is from another library, I have to make some modifications to the data and them to use the function, like this:

data, label = SMOTE().fit_sample(d2_train_data, dataset.train_labels)

where d2_train_data is the training data transformed into a 2D-array to use the SMOTE function.
Then I return the data to the original format and load the base using the Dataloader.

dataset.train_data = torch.from_numpy(dataset.train_data)
dataset.train_labels = torch.from_numpy(dataset.train_labels)

dataset = torch.utils.data.TensorDataset(dataset.train_data, dataset.train_labels)

loader = DataLoader( dataset=dataset, batch_size=64, shuffle=True)

However in the end is giving error related to the weights of the network.

RuntimeError: Given groups=1, weight[64, 3, 7, 7], so expected input[64, 32, 32, 3] to have 3 channels, but got 32 channels instead

I think it’s due to the transform that I can not use after this modification. Am I doing something wrong?

ptrblck · May 9, 2018, 12:27am

That’s good news! Your SMOTE approach sounds interesting.

Your error is most likely due to the image loading.
It seems the channels are in dimension 3, while PyTorch needs them in dimension 1.
Try to permute your images where you are loading them:

image = image.permute(0, 3, 1, 2).contiguous()

Let me know, if it helps!

Josiane_Rodrigues · May 9, 2018, 2:37pm

Thanks. I’ll test and return you later.

vfdev-5 · May 22, 2018, 2:43pm

Josiane could you please provide more details on how do you use SMOTE on images, if I understand correctly.
Thanks.

Josiane_Rodrigues · May 22, 2018, 3:47pm

Hi @vfdev-5,
Do you want to understand how SMOTE works, or how I implemented it in my code? I have not yet tested the solution that @ptrblck suggested for the error I reported above, I am trying to solve another problem. But I will come back to this problem as soon as possible.

vfdev-5 · May 22, 2018, 4:11pm

@Josiane_Rodrigues I wanted to understand how SMOTE works on images and if it really makes sens to do it on images (i.e SMOTE on images = blending of images) ?

Josiane_Rodrigues · May 22, 2018, 4:25pm

SMOTE augments artificial examples created by interpolating neighboring data points.
I’m not sure if this makes sense in images, because as you said SMOTE will do a blending of images. I have the impression that this does not work very well, but I wanted to test it anyway because it has works in the literature that uses this techniques in images and the result is favorable.

ptrblck · May 22, 2018, 4:45pm

Could you post some Papers on this topic please?
I know I’ve read some a while ago and cannot find it!

Josiane_Rodrigues · May 22, 2018, 5:10pm

I am following this paper iin my experiments (https://arxiv.org/abs/1710.05381). This paper compares some methods to solve the problem of unbalanced classes. It does not use SMOTE but references this method. I expressed myself poorly, I’m not sure if there are works that use SMOTE, but I wanted to test SMOTE more out of curiosity, to know the behavior of this technique in images.

Griffintaur · October 22, 2019, 6:25pm

@Josiane_Rodrigues @ptrblck I am also looking to use SMOTE for image dataset. Can you let me how you used SMOTE for image dataset?

Ksalomon · May 8, 2020, 1:19am

Hi :Josiane, how do you transform a tensor of images into a 2D-array ? can you elaborate more on this ?

ARNAB_BANERJEE · February 22, 2021, 6:22pm

@Josiane_Rodrigues were you able to implement SMOTE on CIFAR? Did you get better results? I am facing a similar issue: Performance of SMOTE on CIFAR10 dataset

Gyuseong_Lee · December 4, 2021, 6:30pm

@ptrblck hello!
I registered this platform. 'cause I wanted let you know the result of above WeightedRandomSampler!
After I train, I will let you know. This dataset is so epic and so suitable for this.( some label have no data and some have 30 size but some have 700 size for train set )

Gyuseong_Lee · December 5, 2021, 1:10am

just train : 0.47
random sample : 0.44

RAFAIL_MAHAMMADLI · January 10, 2022, 8:58pm

Hello all,

I want apply SMOTE technique for balancig minortiy classes
I got following error. Can someone help me to solve this error?

Thanks

RAFAIL_MAHAMMADLI · January 11, 2022, 11:51pm

Hello @ptrblck,

In this code, we only do over-sampling minority classes. How can we use under-sampling for majority classes?

Can you please show it with the code?

Thank you

ptrblck · January 12, 2022, 12:19am

The posted code balances the targets in each batch by using weights to sample each data point.
Are you looking for a way to sample all minority class samples once and understample the majority classes, which would then yield less samples than were defined in the Dataset?

RAFAIL_MAHAMMADLI · January 12, 2022, 12:38am

@ptrblck. Thank you for the response. I am workingg with multiclass classification problem . I have 3 classes . Training set contain 127 images belong to first class, 141 images belong to second class, 257 images belong to third class. I need to use over-sampling and under-sampling techniques to see which technique works well for my data.