RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 100 and 25 in dimension 1

Hi,
I want to extract whole frames from videos in the dataset. the dataset contains videos with a different number of frames. when I tried to extract frames using the following code:

import torch
from torch.utils.data import Dataset
from PIL import Image
import numpy as np
import cv2

class makeDataset(Dataset):
    def __init__(self, dataset, labels, spatial_transform, seqLen):
        self.spatial_transform = spatial_transform
        self.images = dataset
        self.labels = labels
        self.seqLen = seqLen

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        vid_name = self.images[idx]
        label = self.labels[idx]
        inpSeq = []
        vid=cv2.VideoCapture(vid_name)
        index=0
        frame =None
        ret, frame = vid.read()
        while(index<=self.seqLen):
            # Extract images
            ret, frame = vid.read()
            if not ret:
                break
            index+=1
            img=Image.fromarray(frame)
            inpSeq.append(self.spatial_transform(img.convert('RGB')))
        inpSeq = torch.stack(inpSeq, 0)
        return inpSeq, label

i got the following error

Traceback (most recent call last):

  File "<ipython-input-11-947fd8c9ce15>", line 1, in <module>
    runfile('C:/Users/Windows10/Downloads/CodeF/main-run-vr.py', wdir='C:/Users/Windows10/Downloads/CodeF')

  File "C:\Users\Windows10\Anaconda3\envs\New\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\Windows10\Anaconda3\envs\New\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Windows10/Downloads/CodeF/main-run-vr.py", line 403, in <module>
    __main__()

  File "C:/Users/Windows10/Downloads/CodeF/main-run-vr.py", line 401, in __main__
    evalInterval, evalMode, numWorkers, outDir,modelUsed,pretrained,train_test_split,datasetDir,crossValidation,nFolds)

  File "C:/Users/Windows10/Downloads/CodeF/main-run-vr.py", line 360, in main_run
    modelTrain(folds,modelUsed,pretrained,data,label,class_names,data2,label2,class_names2,numEpochs,evalInterval,evalMode,outDir,numWorkers,lr, stepSize, decayRate, trainBatchSize, seqLen, True)

  File "C:/Users/Windows10/Downloads/CodeF/main-run-vr.py", line 187, in modelTrain
    for i, (inputs, targets) in enumerate(trainLoader):

  File "C:\Users\Windows10\Anaconda3\envs\New\lib\site-packages\torch\utils\data\dataloader.py", line 346, in __next__
    data = self.dataset_fetcher.fetch(index)  # may raise StopIteration

  File "C:\Users\Windows10\Anaconda3\envs\New\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
    return self.collate_fn(data)

  File "C:\Users\Windows10\Anaconda3\envs\New\lib\site-packages\torch\utils\data\_utils\collate.py", line 80, in default_collate
    return [default_collate(samples) for samples in transposed]

  File "C:\Users\Windows10\Anaconda3\envs\New\lib\site-packages\torch\utils\data\_utils\collate.py", line 80, in <listcomp>
    return [default_collate(samples) for samples in transposed]

  File "C:\Users\Windows10\Anaconda3\envs\New\lib\site-packages\torch\utils\data\_utils\collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 100 and 25 in dimension 1 at C:\w\1\s\tmp_conda_3.6_045031\conda\conda-bld\pytorch_1565412750030\work\aten\src\TH/generic/THTensor.cpp:689

the sequence length dimension must be the same for all videos. How can I extract all frames from videos that contains a different number of frames?
Thanks in advance :slight_smile:

You can return a list instead of a tensor if your dimensions are missmatching.
Obviously torch.stack requires both tensors to share dimensionality.
You can pad one of them up to the size of the other but list seems simpler.

Returning a list you will get a a list of list of frames

Theres’ something wrong in your spatial transformation function. Are you sure that the spatial_transform generates output in the format (B,C,H,W) ?. Also self.seqLen can be inferred from the video in OpenCV.

Thank you, but how can I turned again to tensor because I need to make some operations on it?

I don’t follow. I think you would return make the tensor conversion the way you are doing now. using torch.stack .