i would like to create a data that is imbalanced by classes. par example, i want 5 classes that are 30%
of the data, and 5 other classes that are 70% of the data. and by using subset, it doesn’t work.
i would like for some help, thanks.
this is what i did -
_,set1 = createDataset(trainset, np.array([0,2,4,6,8]), isTrain=True,batchSize=1, test=False)
set1 = partialDataset(set1, num=1750,class_num=5)
_,set2 = createDataset(trainset, np.array([1,3,5,7,9]), isTrain=True,batchSize=1, test=False)
set2 = partialDataset(set2, num=750,class_num=5)
trainset = set1.dataset + set2.dataset
(createDataset return the trainset with the classes that are in the array.
partial dataset is th e following function
def partialDataset(trainset, num=0,class_num=10):
indices = np.arange(len(trainset))
train_indices, _ = train_test_split(indices, train_size=num * class_num, stratify=trainset.targets)
# Warp into Subsets and DataLoaders train_dataset = torch.utils.data.Subset(trainset, train_indices) return train_dataset
thanks a lot!