I am doing distributed training with the mnist dataset. The mnist dataset is only split (by default) between training and testing set. I would like to split the training set in training and validation set.
I could do this as follows:
# Shuffle the indices
indices = np.arange(60000)
np.random.shuffle(indices)
# Build the train loader
train_loader = torch.utils.data.DataLoader(datasets.MNIST('mnist', download=True, train=True,
transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])),
batch_size=64, shuffle=False, sampler=torch.utils.data.SubsetRandomSampler(indices[:55000]))
# Build the validation loader
val_loader = torch.utils.data.DataLoader(datasets.MNIST('mnist', download=True, train=True,
transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])),
batch_size=64, shuffle=False, sampler=torch.utils.data.SubsetRandomSampler(indices[55000:]))
but how do I incorporate the distributed training? Before (without split) I was doing it as follows:
# gets data from training-dir (s3 bucket path)
def _get_train_data_loader(batch_size, training_dir, is_distributed, **kwargs):
logger.info("Get train data loader")
dataset = datasets.MNIST(training_dir, train=True, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
train_sampler = torch.utils.data.distributed.DistributedSampler(dataset) if is_distributed else None
return torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=train_sampler is None,
sampler=train_sampler, **kwargs)
The problem is that I do not know how to pass 2 samplers to do both things when creating the dataloader.
I found this https://github.com/pytorch/pytorch/issues/23430 which seems to be related but since I am a beginner I am not really able to make sense of it.