Create Dataset class with different input sizes

lorenzo_fabbri · May 21, 2019, 10:28am

My PyTorch model contains two CNNs, the outputs of which are then merged and passed through a series of fully-connected layers. The inputs of the two CNNs are matrices: the problem is that for the first CNN the matrices have shape 128x100, while for the second it’s 128x1000. I’m now trying to create a Dataset class to generate the loaders. At the moment I wrote the following:

class Data(Dataset):
    
    def __init__(self, dataP, targetP, dataC, targetC, transform=None):
        self.dataP = [torch.from_numpy(X).int() for X in dataP]
        self.targetP = [torch.from_numpy(y).float() for y in targetP]
        
        self.dataC = [torch.from_numpy(X).int() for X in dataC]
        self.targetC = [torch.from_numpy(y).float() for y in targetC]
        
        self.transform = transform
    
    def __getitem__(self, index):
        Xp = self.dataP[index]
        yp = self.targetP[index]
        
        Xc = self.dataC[index]
        yc = self.targetC[index]
        
        if self.transform:
            Xp = self.transform(Xp)
            Xc = self.transform(Xc)
            
        return Xp, yp, Xc, yc
    
    def __len__(self):
        return len(self.dataP)

While the code seems to be running without any problem, I’m rather sure there is something wrong since in the __len__ method I return the length of one of the inputs. Is it possible to take care of the different size of the inputs?

LeviViana · May 21, 2019, 4:16pm

Maybe this would be safer:

def __len__(self): 
    return min(len(self.dataP), len(self.dataC))

Otherwise you could have out of range errors if the len(self.dataP) > len(self.dataC). The drawback of this method is that it could skip a lot of training data. Are you datasets aligned ? i.e., the 10th sample of dataP must be trained together with the 10th sample of dataC ?