Hello All
I have dataset 7 class, 10000 images imbalanced class data samples.
i am planning to use pnas-large as my base network.
what strategies should i follow? how much %params should be fixed?
Thanks
Regards
Milton
Hello All
I have dataset 7 class, 10000 images imbalanced class data samples.
i am planning to use pnas-large as my base network.
what strategies should i follow? how much %params should be fixed?
Thanks
Regards
Milton
Assuming more data (1 in above) is out of the picture, my go-to’s for biased datasets are stratified sampling (3 in above) and weighted loss (6 in above). See (WeightedRandomSampler, forums) and (X-entropy loss weight
parameter), respectively.
Weighted loss is a little easier to implement, so that’s usually where I start. Stratification is touchy. Often weighting so every class is drawn evenly doesn’t generalize well. Finding the sweet spot can be a pain, especially when you have several (or really just > 2) classes.
hi Dylan
Thank you so much.
class_sample_count = np.repeat(0,num_classes) # dataset has 10 class-1 samples, 1 class-2 samples, etc.
train_Data = get_train_data()
for train_data_row in train_Data:
index = int(train_data_row[1])
class_sample_count[index]=class_sample_count[index]+1
class_sample_count=class_sample_count/len(train_Data)
class_sample_count=1/class_sample_count
This below worked for me.
weights = []
for train_Data_row in train_Data:
weight=class_sample_count[int(train_data_row[1])]
weights.append(weight)
weights=torch.Tensor(weights)
sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, len(weights))
trainloader = torch.utils.data.DataLoader(train_data_set, batch_size=batch_size, sampler=sampler)