Spliting data into imbalanced data

hi
i would like to create a data that is imbalanced by classes. par example, i want 5 classes that are 30%
of the data, and 5 other classes that are 70% of the data. and by using subset, it doesn’t work.
i would like for some help, thanks.
this is what i did -
_,set1 = createDataset(trainset, np.array([0,2,4,6,8]), isTrain=True,batchSize=1, test=False)
set1 = partialDataset(set1, num=1750,class_num=5)
_,set2 = createDataset(trainset, np.array([1,3,5,7,9]), isTrain=True,batchSize=1, test=False)
set2 = partialDataset(set2, num=750,class_num=5)
trainset = set1.dataset + set2.dataset

(createDataset return the trainset with the classes that are in the array.
partial dataset is th e following function

def partialDataset(trainset, num=0,class_num=10):
indices = np.arange(len(trainset))
train_indices, _ = train_test_split(indices, train_size=num * class_num, stratify=trainset.targets)

# Warp into Subsets and DataLoaders
train_dataset = torch.utils.data.Subset(trainset, train_indices)
return train_dataset

thanks a lot!
Eilon

Since you are using stratify=trainset.targets, you should get balanced splits as is described in the docs of sklearn.model_selection.train_test_split.
If you want to manually create imbalanced splits, you could sample the desired amount of class indices from trainset.targets and create a new Subset using the new sample distribution.