Video classification for varying length of videos

I have gone through almost all the articles for performing the said activity, even used the collate_fn and custom collate functions, but could not make it work. I was wondering if someone can help, it would be a life saver.

My directory structure is something like -

root / 
           |_  ABCD_1.jpg
           |_  ABCD_2.jpg
Labels are mentioned in the following way - 
ABCD |  Basketball
EFGH |  Football

My code for loading data is -

# for CRNN
class Dataset_CRNN(data.Dataset):
    "Characterizes a dataset for PyTorch"
    def __init__(self, data_path, folders, labels, frames, transform=None):
        self.data_path = data_path
        self.labels = labels
        self.folders = folders
        self.transform = transform
        self.frames = frames

    def __len__(self):
        "Denotes the total number of samples"
        return len(self.folders)

    def read_images(self, path, selected_folder, use_transform):
        X = []
        for i in self.frames:
            image =, selected_folder, '{}_{}.jpg'.format(selected_folder,i))))
            if use_transform is not None:
                image = use_transform(image)

        X = torch.stack(X, dim=0)

        return X

    def __getitem__(self, index):
        "Generates one sample of data"
        # Select sample
        folder = self.folders[index]

        # Load data
        X = self.read_images(self.data_path, folder, self.transform)     # (input) spatial images
        y = torch.LongTensor([self.labels[index]])                  # (labels) LongTensor are for int64 instead of FloatTensor

        # print(X.shape)
        return X, y

What are you trying to achieve, what is not working, and where are you stuck at the moment? :slight_smile:

I have solved this, but I’m stuck with a different issue now. I’ll post it in a different thread.
Thanks alot!