Hi all,
I am trying DistributedSampler and Dataloader for my custom dataset and the model.
But I’m in trouble using specific indice in Dataloader.
I want to train my model by random shuffling only for a specific indices from the whole data.
For example, Here is my example code:
from torch.utils.data.distributed import DistributedSampler as DistributedSampler
train_dataset = MyDataset(args.data_dir, split=‘xxx’, input_size=input_size)
train_sampler = DistributedSampler(train_dataset, num_replicas=args.gpus, rank=gpu)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=int(args.batch_size/args.gpus), shuffle=False, sampler=train_sampler, collate_fn=train_dataset.collate_fn, num_workers=int(args.workers/args.gpus), pin_memory=True)
I think I need to modify part sampler.
But I have no idea how to make this function to use specific indices.
Is there a way to implement the function I want there?
P.S.
When I used Dataparallel, I can use specific indices like below:
specific_indices = [10, 99, 5, 8, …]
train_dataset = MyDataset(root=xxx, transform=xxx)
train_loader = DataLoader(train_dataset, batch_size=args.batch_size, num_workers=args.num_workers, sampler=SubsetRandomSampler(specific_indices), collate_fn=detection_collate, pin_memory=True)
Thank you in advance.