Hi, I am confused about the parameter “drop_last” of DistributedSampler and DataLoader in ddp. Both have parameters drop_last. What is the best practice for these settings for training and validation datasets?
For training dataset:
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset, shuffle=True, drop_last=False)
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size_per_gpu,
shuffle=(train_sampler is None),
num_workers=workers_per_gpu,
sampler=train_sampler,
drop_last=True
)
For validation dataset:
val_sampler = torch.utils.data.distributed.DistributedSampler(val_dataset, shuffle=False, drop_last=False)
val_loader = torch.utils.data.DataLoader(val_dataset,
batch_size=batch_size_per_gpu,
shuffle=False,
num_workers=workers_per_gpu,
sampler=val_sampler,
drop_last=False
)
Are these the correct ways to set the DistributedSampler and DataLoader? If so, for the val_loader, the last batch could be not evenly divided by the number of gpu. It would be really helpful if someone can detail about the process.