I defined a simple custom data set like in the official example and named the class “Dataset”. Then in my training method I split the data set into training and validation like this:
dataset = Dataset(opt)
train_size = int(0.8*len(dataset))
val_size = len(dataset) - train_size
lengths = [train_size, val_size]
train_dataset, val_dataset = torch.utils.data.dataset.random_split(dataset, lengths)
trainloader = DataLoader(
train_dataset,
batch_size=opt.n_batches,
shuffle=True,
num_workers=opt.n_workers,
pin_memory=True
)
valloader = DataLoader(
val_dataset,
batch_size=1,
shuffle=True,
num_workers=opt.n_workers,
pin_memory=True
)
In the __getitem__(self, idx)
method of the data set I would like to have different functionality for validation set than for training set. How can I make a difference? Should I initialize the datasets independently?
I am basically looking for a variable like self.training
in the nn.Module class (see here).