Video Classification using Transfer Learning (ResNet 3D) Pytorch

Dear all,
i have .npz files in my dataset which every file represent sequence of image(15frames) “X” and its target “Y”:
video1: [array(img1, img2, img3, …, img10)], [Y1]
video2: [array(img1, img2,img3, …, img10)], [Y2]

and i search how to load this custom data with DataLoader if it is possible?
Thanks :slight_smile:

Hey -
This could be done in the following way -

class VideoDataset(
    def __init__(self, path_to_npy_files):
        self.video_files = os.listdir(path_to_npy_files)
       Anything else here

    def __getitem__(self, idx):
        video_frames = []

        video_file = self.video_files[idx]
        video = numpy.load(video_file)

        image_tensor = torch.from_numpy(video[0]).permute(0, 3, 1, 2) # I am assuming video is of shape D, H, W, C. If not so, please change accordingly
        label = torch.tensor(video[1], dtype = torch.long) # I assume you are classifying video frames
        return video_tensor, label    

    def __len__(self):
        return len(self.video_files)

I am not sure what exactly is in the npz file. From what I understood, npz file stores an array, who’s first entry is an array which contains the images and second is the class, and I have wriiten the dataset accordingly, If this is not the case, then please change accordingly. only the video[0] and video[1] part would change.

The dataloader can be created as follows -

def GetDataloader(path_to_npy_files, batch_size, num_workers):
    dataset = VideoDataset(path_to_npy_files)
    dataloader = = dataset, batch_size=batch_size, num_workers = num_workers, shuffle=True)

PS - Please excuse any indentation errors, I have directly typed the code here.

1 Like

But i need to use .npz file, each npz file represent (X , label):
it’s correct like this ?

def npy_loader(path):
    with np.load(path) as train_data:
    X = torch.from_numpy(train_examples)
    Y = torch.from_numpy(train_labels)
    return X,Y

dataset = datasets.DatasetFolder(

Oh yeah this works too, I forgot you already have an array of frames :sweat_smile:
So I believe this should work.
are you facing any trouble with this approach ?

with ResNet 3D ?
not yet I am in the first task (data preparation)

I am not familiar with ResNet3D , but if it has 3D convolutions, then the input to the network should be in the shape - [batch, num_channels, num_frames/depth, H, W]
that is the shape that would get loaded with my dataloader. I am not so familiar with DatasetFolder.
If this is the shape getting loaded, then you are good to go

1 Like