You are right in your understanding.
Yeah, that’s the issue I mentioned in my first post.
If I understand you correctly, you need a pair of training and validation data.
So your training data will be repeated (in your example 6 times) for each sample of the validation data.
Is that correct?
You could artificially make the validation Dataset
larger with this somewhat ugly hack:
class MyDataTrain(Dataset):
def __init__(self, length):
self.data = torch.randn(length, 1)
self.target = torch.Tensor(length).uniform_(0, 10).long()
def __getitem__(self, index):
x = self.data[index]
y = self.data[index]
return x, y
def __len__(self):
return len(self.data)
class MyDataVal(Dataset):
def __init__(self, length, fake_length):
self.data = torch.randn(length, 1)
self.target = torch.Tensor(length).uniform_(0, 10).long()
self.fake_length = fake_length
self.real_length = len(self.data)
def __getitem__(self, index):
index = index % self.real_length
x = self.data[index]
y = self.data[index]
return x, y
def __len__(self):
return self.fake_length
train_dataset = MyDataTrain(length=60000)
val_dataset = MyDataVal(length=10000, fake_length=60000)
train_loader = DataLoader(train_dataset)
val_loader = DataLoader(val_dataset)
for batch_idx, (train_batch, val_batch) in enumerate(zip(train_loader, val_loader)):
# your code here
Note the modulo operation in __getitem__
of the validation Dataset
.
It will give fake_length
samples, iterating over the same smaller dataset.
Can you work with this?