hi

i would like to create a data that is imbalanced by classes. par example, i want 5 classes that are 30%

of the data, and 5 other classes that are 70% of the data. and by using subset, it doesn’t work.

i would like for some help, thanks.

this is what i did -

_,set1 = createDataset(trainset, np.array([0,2,4,6,8]), isTrain=True,batchSize=1, test=False)

set1 = partialDataset(set1, num=1750,class_num=5)

_,set2 = createDataset(trainset, np.array([1,3,5,7,9]), isTrain=True,batchSize=1, test=False)

set2 = partialDataset(set2, num=750,class_num=5)

trainset = set1.dataset + set2.dataset

(createDataset return the trainset with the classes that are in the array.

partial dataset is th e following function

def partialDataset(trainset, num=0,class_num=10):

indices = np.arange(len(trainset))

train_indices, _ = train_test_split(indices, train_size=num * class_num, stratify=trainset.targets)

```
# Warp into Subsets and DataLoaders
train_dataset = torch.utils.data.Subset(trainset, train_indices)
return train_dataset
```

thanks a lot!

Eilon