I am trying to extract frames from clips in the UCF 101 dataset. When I load the first fold for the training set using the data loader (frames_per_clip = 32), I get 1490434 datapoints. I was expecting 833 data points (833, 3, 32, H , W). I want to use the R(2+1)D model to extract features from UCF-101, so that I get features of shape [833, 512]. Can someone please help me figure this out?