WeightedRandomSampler on sequential data

My dataset contains handwritten text line images. You can see the example below:

I am giving these images to CNN for feature extraction and then these features to BLSTM encoder and then LSTM decoder to decode single character per time step. We have 6161 line of images in train set. But the dataset having long tail. This can be seen in below histogram.

The dataset contain 79 classes. Samples doesn’t have a single target, rather it consists of sequence of classes. The WeightedRandomSampler gets weights of different classes and try to create batches which contains equal number of sample based on the given weights. Now the question is, this WeightedRandomSampler can work on sequential data?

Hi, @ptrblck I am following your answers in some of the threads related to WeightedRandomSampler. Can you guide me regarding any benefit on the usage of WeightedRandomSampler on above mention situation?

Based on the description of your use case, it seems you are working on a multi-label classification use case. In that case I don’t think that the WeightedRandomSampler can be used easily to balance the batches. This post explains other methods how these use cases can potentially be balanced.

1 Like