A dataloader for multiple similar inputs

Hi All,
I have a network that takes three images in the input layer. Now the three images must be frames of the same video (this can be known only from the filename of the image). I understood how to change the dataloader’s __getitem__ to send multiple inputs from here. But how do I make sure three images from the same video are chosen and that none of them are repeated in the same epoch ?

It depends a bit on the current way of storing these files.
Did you store each image from a video in a corresponding folder?
If so, how would you like to sample the files, if their amount is not divisible without a remainder by 3? Would you like to drop the last files or fill it up with repetition?

Should the frames be contiguous in a sample or would you like to shuffle the frames from a single video?
Also, would you like to shuffle the video folders or should the videos be loaded in a consecutive way?

Thanks for replying, @ptrblck
Currently all images are together in test, train and val folders. If amount not divisible by 3, dropping the last frame is no problem . The frames from a single video need not be contiguous. It’s fine as long as they are from the same video.

The video folders (I can change my dataset to that format) can be shuffled freely. There’s no order among the videos, and the order of the frames in a single video can also be ignored.

Here is a small dummy example using multiple video folders.
Note that I’ve used tensors directly, so you should add your frame loading logic into the Dataset.

class MyDataset(Dataset):
    def __init__(self, videos, transform=None, nb_frames=3):
        self.nb_frames = nb_frames
        self.transform = transform
        
        # Crop data to multiple of nb_frames
        self.data = [v[:-(len(v)%self.nb_frames)] if len(v)%self.nb_frames!=0 else v 
                     for v in videos]
        
        # calculate lengths
        self.lens = [len(d)//self.nb_frames for d in self.data]
        # calculate offsets
        self.offsets = np.concatenate(([0], np.cumsum(self.lens[:-1])))
        
    
    def __getitem__(self, index):
        # subtract offset
        print('index: {}'.format(index))
        # get corresponding video file
        found = False
        for i, offset in enumerate(self.offsets):
            if index < offset:
                print('subtracting {} from index'.format(self.offsets[i-1]))
                index -= self.offsets[i-1]
                index *= self.nb_frames
                found = True
                break
        # handle last video separately
        if not found:
            index -= self.offsets[-1]
            index *= self.nb_frames
            i += 1

        # select correspondind data
        print('selecing video {}'.format(i-1))
        data = self.data[i-1]
        # get frames
        print('reading frames {}'.format([idx for idx in range(index, index+self.nb_frames)]))
        x = []
        for idx in range(index, index+self.nb_frames):
            tmp = data[idx]
            if self.transform:
                tmp = self.transform(tmp)
            x.append(tmp)
        x = torch.cat(x)
        
        return x
        
    def __len__(self):
        return np.sum(self.lens)


videos = [torch.ones(torch.randint(3, 12, (1,)), 1)*i for i in range(5)]

dataset = MyDataset(videos)
for data in dataset:
    print(data)

loader = DataLoader(
    dataset,
    batch_size=2,
    shuffle=True)
for data in loader:
    print(data)

The code currently uses nb_frames consecutive frames for each video folder and removes the trailing frames.
Shuffling using a DataLoader will work.
I’ve also tried to add some debug print statement for better understanding, but let me know, if your need more information about this code.

1 Like

Hi @ptrblck, I was wondering how the concept of ordinary dataloaders fit with this particular Dataset. That is, how the ordinary structure of -

train/  
   classA/
   classB/
val/
   classA/
   classB/ 
test/ 
   classA/ 
   classB/ 

fit with this type of loading. Inside a particular train, test or val folder, do we have to further pass the paths of the subfolders corresponding to each of the classes ? (this was automatically detected in the default approach)

I would create separate Datasets for the train, val, and test folders, so that you could stick yo your current Dataset implementation and have a clean cut between the data splits to avoid data leakage.

Thanks, that makes much more sense. Should I worry about the labels or will it be handled automatically ?

In my code snippet I’m not handling the labels currently, so you would need to add them for your use case.

Hi @ptrblck ,I noticed in the source here that the __getitem__ method of the image folder class returns a sample, target pair where sample is the image and target is the target label. In case of my dataset, how do I handle the case of three images having the same label ?

The forward method of my final model looks like this -

def forward(self,image1,image2,image3):
        x1 = self.model1(image1)
        x2 = self.model2(image2)
        x3 = self.model3(image3)
        x4 = torch.cat((x1, x2, x3), dim=1)
        x5 = self.classifier1(F.relu(x4))
        x6 = self.classifier2(F.relu(x5))
        x7 = self.classifier3(F.relu(x6))
        x8 = self.logSoftmax(x7)
        #print(x6.shape)
        return x8

In the train function, the images are fetched like -

for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                .....

If I concatenate the three images in one tensor and pass along a single label, does the model understand it’s actually three images and treat it as such ? I feel a bit lost here.

I would rather write a custom Dataset and return these three images in your __getitem__ method:

def __getitem__(self, index):
    # load the images and your labels according to your code logic
    ...
    reutrn image1, image2, image3, target
    # or concatenate the images and return them as a single one
    return images, target

I’m not sure to understand this question properly.
Your current code snippet will work, but I’m not sure what your use case is.

My use case is that I’m cropping certain portions of frames from a video and storing them in folders (hence the images have to be from the same folder) . Then three images (which are frames of the same video) are passed through three models before the models join at the fc layers.

I was not sure if it would work, thank you.

That should work with your code snippet, however you could alternatively also use a single base model and pass these 3 frames through the same model and concatenate the output afterwards.

Hi @ptrblck,
I tried both return x, target and return x[0],x[1],x[2],target and the error I got was -

TypeError                                 Traceback (most recent call last)
<ipython-input-83-e03efae61dea> in <module>
----> 1 model, history_model = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=10)

<ipython-input-47-f9a9be299e22> in train_model(model, criterion, optimizer, scheduler, num_epochs)
     34                 # track history if only in train
     35                 with torch.set_grad_enabled(phase == 'train'):
---> 36                     outputs = model(inputs)
     37                     #print(outputs.shape)
     38                     _, preds = torch.max(outputs, 1)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

TypeError: forward() missing 2 required positional arguments: 'image2' and 'image3'

in both the cases.

Since your forward definition is defined as:

def forward(self,image1,image2,image3)

you should pass inputs separately or unwrap it:

outputs = model(*inputs)
# or
outputs = model(inputs[0], inputs[1], inputs[3]) # index in the dimemsion you've concatenated the inputs

Thanks a lot.

Let me try that and get back to you.

For some reason, the size of inputs in model(*inputs) is getting dependent on the batch size.

For a batch size of 16, doing model(*inputs) gives me Expected 3 inputs, got 16. I must have made a mistake.

My code snippet was probably wrong.
Could you check the shape of inputs and split it in the right dimension?
E.g. if your inputs have the shape [batch_size, 3, ...] split it in dim1.

Yes the inputs have the shape - torch.Size([16, 3, 224, 224])

What do I put as the split size or sections in torch.split( inputs , split_size_or_sections, dim=1 ) ?

You could use the following (depending on the expected shape inside forward):

outputs = model(*x.split(1, dim=1))  # each input will have shape [16, 1, 224, 224]
outputs = model(x[:, 0], x[:, 1], x[:, 2])  # each input will have shape [16, 224, 224]

Using the first option, I get the error


RuntimeError                              Traceback (most recent call last)
<ipython-input-118-e03efae61dea> in <module>
----> 1 model, history_model = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=10)

<ipython-input-117-f1d1a61a074d> in train_model(model, criterion, optimizer, scheduler, num_epochs)
     36                 with torch.set_grad_enabled(phase == 'train'):
     37                     #outputs = model(inputs[:, 0], inputs[:, 1], inputs[:, 2])
---> 38                     outputs = model(*inputs.split(1, dim=1))
     39                     #outputs = model(inputs)
     40                     #print(outputs.shape)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

<ipython-input-11-f590b21f7c95> in forward(self, image1, image2, image3)
     11 
     12     def forward(self,image1,image2,image3):
---> 13         x1 = self.model1(image1)
     14         x2 = self.model2(image2)
     15         x3 = self.model3(image3)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torchvision/models/vgg.py in forward(self, x)
     40 
     41     def forward(self, x):
---> 42         x = self.features(x)
     43         x = self.avgpool(x)
     44         x = x.view(x.size(0), -1)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
    336                             _pair(0), self.dilation, self.groups)
    337         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 338                         self.padding, self.dilation, self.groups)
    339 
    340 

RuntimeError: Given groups=1, weight of size 64 3 3 3, expected input[16, 1, 224, 224] to have 3 channels, but got 1 channels instead

This is strange, the torch tensor originally had a size of [16,3,224,224]. Is the problem in the architecture of the model ?

Using the second option I get a similar error where it says the expected tensor is 4d and [16, 224, 224] is 3d.