Train two Dataset together in Capsule Network

ptrblck · January 25, 2018, 1:20pm

You are right in your understanding.

Yeah, that’s the issue I mentioned in my first post.

If I understand you correctly, you need a pair of training and validation data.
So your training data will be repeated (in your example 6 times) for each sample of the validation data.
Is that correct?

You could artificially make the validation Dataset larger with this somewhat ugly hack:

class MyDataTrain(Dataset):
    def __init__(self, length):
        self.data = torch.randn(length, 1)
        self.target = torch.Tensor(length).uniform_(0, 10).long()
    
    def __getitem__(self, index):
        x = self.data[index]
        y = self.data[index]
        return x, y

    def __len__(self):
        return len(self.data)

class MyDataVal(Dataset):
    def __init__(self, length, fake_length):
        self.data = torch.randn(length, 1)
        self.target = torch.Tensor(length).uniform_(0, 10).long()
        self.fake_length = fake_length
        self.real_length = len(self.data)
    
    def __getitem__(self, index):
        index = index % self.real_length
        x = self.data[index]
        y = self.data[index]
        return x, y

    def __len__(self):
        return self.fake_length

train_dataset = MyDataTrain(length=60000)
val_dataset = MyDataVal(length=10000, fake_length=60000)

train_loader = DataLoader(train_dataset)
val_loader = DataLoader(val_dataset)

for batch_idx, (train_batch, val_batch) in enumerate(zip(train_loader, val_loader)):
    # your code here

Note the modulo operation in __getitem__ of the validation Dataset.
It will give fake_length samples, iterating over the same smaller dataset.

Can you work with this?