Issues with torch.utils.data.random_split

monster · May 18, 2020, 7:51am

sure.Right now I am using this :

trainset = core.Dataset(dataset_path)
train_len=int(len(trainset)*0.8)
test_len=len(trainset)-int(len(trainset)*0.8)
train_set=torch.utils.data.Subset(trainset,range(0,train_len))
val_set=torch.utils.data.Subset(trainset,range(train_len,len(trainset)))

But I want to shuffle train and val set wrt file names because if I shuffle them by indices then some indices of a common file might be in both train and test.

for example: a file ‘cat1.jpg’ has 3 cats on indices 0,1,and 2 in ‘cat1.xml’ so I dont want 0,1 indices in train and 3 indices in test or validation set…I want all three indices of same file either in train or test set