Few shot learning

Arthur_Zakirov · August 25, 2021, 9:42am

Hello everyone,

I’m trying to implement a training method, which trains the model on dataset A first and then continues the training on dataset B. Both datasets should be shuffled independently of each other. There is a class called torch.data.utils.ConcatDataset which enables the combination of 2 datasets however as far as I know it has no shuffling. The DataLoader class can perform the shuffle, but it will mix up the 2 datasets. So how do I implement the shuffling pre concatenation?

Thanks in advance

ptrblck · August 26, 2021, 5:05am

Probably not the cleanest approach, but you could create the indices for both datasets first, shuffle them, wrap both datasets into a ConcatDataset, and use a Subset with the shuffled indices afterwards.
Something like this might work:

# create datasets
dataset1 = torch.utils.data.TensorDataset(torch.arange(10))
dataset2 = torch.utils.data.TensorDataset(torch.arange(10, 20))
dataset = torch.utils.data.ConcatDataset((dataset1, dataset2))

# get indices
idx1 = torch.arange(len(dataset1))
idx2 = len(idx1) + torch.arange(len(dataset2))

# shuffle
idx1 = idx1[torch.randperm(len(idx1))]
idx2 = idx2[torch.randperm(len(idx2))]
idx = torch.cat((idx1, idx2))

# use shuffled indices to create Subset
dataset_shuffled = torch.utils.data.Subset(dataset, idx)

loader = torch.utils.data.DataLoader(
    dataset_shuffled, shuffle=False, batch_size=2)

for data in loader:
    print(data)