How to implement oversampling in Cifar-10?

That’s good to hear! I was checking it myself, since I wanted to have a look at the accuracy.
Would you mind sharing the accuracies you have achieved using the balanced and imbalanced datasets?

I have not finished training yet, but on a balanced dataset I got 93% accuracy and the unbalanced database it’s at 81%, but I’m still running the pre-training. When I finish training, I’ll put the final value here. Thanks again.

Hi @ptrblck,

I achieved 84% accuracy in the final model with the unbalanced dataset. It’s a lot smaller than the value I got with the balanced dataset (93%), so I’m trying another oversampling strategy, however I’m having a problem. Since the function I am using to do oversampling is from another library, I have to make some modifications to the data and them to use the function, like this:

data, label = SMOTE().fit_sample(d2_train_data, dataset.train_labels)

where d2_train_data is the training data transformed into a 2D-array to use the SMOTE function.
Then I return the data to the original format and load the base using the Dataloader.

dataset.train_data = torch.from_numpy(dataset.train_data)
dataset.train_labels = torch.from_numpy(dataset.train_labels)

dataset = torch.utils.data.TensorDataset(dataset.train_data, dataset.train_labels)

loader = DataLoader( dataset=dataset, batch_size=64, shuffle=True)

However in the end is giving error related to the weights of the network.

RuntimeError: Given groups=1, weight[64, 3, 7, 7], so expected input[64, 32, 32, 3] to have 3 channels, but got 32 channels instead

I think it’s due to the transform that I can not use after this modification. Am I doing something wrong?

1 Like

That’s good news! Your SMOTE approach sounds interesting.

Your error is most likely due to the image loading.
It seems the channels are in dimension 3, while PyTorch needs them in dimension 1.
Try to permute your images where you are loading them:

image = image.permute(0, 3, 1, 2).contiguous()

Let me know, if it helps!

Thanks. I’ll test and return you later.

Josiane could you please provide more details on how do you use SMOTE on images, if I understand correctly.
Thanks.

Hi @vfdev-5,
Do you want to understand how SMOTE works, or how I implemented it in my code? I have not yet tested the solution that @ptrblck suggested for the error I reported above, I am trying to solve another problem. But I will come back to this problem as soon as possible.

@Josiane_Rodrigues I wanted to understand how SMOTE works on images and if it really makes sens to do it on images (i.e SMOTE on images = blending of images) ?

SMOTE augments artificial examples created by interpolating neighboring data points.
I’m not sure if this makes sense in images, because as you said SMOTE will do a blending of images. I have the impression that this does not work very well, but I wanted to test it anyway because it has works in the literature that uses this techniques in images and the result is favorable.

Could you post some Papers on this topic please?
I know I’ve read some a while ago and cannot find it!

I am following this paper iin my experiments (https://arxiv.org/abs/1710.05381). This paper compares some methods to solve the problem of unbalanced classes. It does not use SMOTE but references this method. I expressed myself poorly, I’m not sure if there are works that use SMOTE, but I wanted to test SMOTE more out of curiosity, to know the behavior of this technique in images.

3 Likes

@Josiane_Rodrigues @ptrblck I am also looking to use SMOTE for image dataset. Can you let me how you used SMOTE for image dataset?

Hi :Josiane, how do you transform a tensor of images into a 2D-array ? can you elaborate more on this ?

@Josiane_Rodrigues were you able to implement SMOTE on CIFAR? Did you get better results? I am facing a similar issue: Performance of SMOTE on CIFAR10 dataset

@ptrblck hello!
I registered this platform. 'cause I wanted let you know the result of above WeightedRandomSampler!
After I train, I will let you know. This dataset is so epic and so suitable for this.( some label have no data and some have 30 size but some have 700 size for train set )

just train : 0.47
random sample : 0.44

Hello all,

I want apply SMOTE technique for balancig minortiy classes
I got following error. Can someone help me to solve this error?

Thanks

Hello @ptrblck,

In this code, we only do over-sampling minority classes. How can we use under-sampling for majority classes?

Can you please show it with the code?

Thank you

The posted code balances the targets in each batch by using weights to sample each data point.
Are you looking for a way to sample all minority class samples once and understample the majority classes, which would then yield less samples than were defined in the Dataset?

@ptrblck. Thank you for the response. I am workingg with multiclass classification problem . I have 3 classes . Training set contain 127 images belong to first class, 141 images belong to second class, 257 images belong to third class. I need to use over-sampling and under-sampling techniques to see which technique works well for my data.