Classifying spatio-temporal data/videos

I am a PyTorch newbie. As part of my research I need to build a video classifier and train it on a dataset on different kind of fluids(videos of fluid flows).
I have previously worked on image classification using PyTorch but I am a bit clueless about how to do the same on videos.
Can you all please suggest some resources/pointers that might be helpful?


Hi @Subhankar_Ghosh,
For video processing you want to somehow retrieve the information of corresponding images. For example, this can be done by sequential models like RNNs or 3D CNNs (the extra dimension is for successive images). “All you need” is to wrap a image classifier within a sequential model (or extend one time dimension). Since you did work with image classifiers I’m sure you will get there.

I found these links repository to be helpful:

  1. Large-scale Video Classification with Convolutional Neural Networks - Karpathy
  2. HHTseng’s github repo
  3. RNNs - Karpathy’s blog

Good luck and have fun with your implementation! :slight_smile:

Hi @christopherkuemmel, thanks for the help.

Do you think the conv3d layers in Pytorch can be used for videos?

@Subhankar_Ghosh For sure. A Conv3d layer has the input shape of (N, C, D, W, H) where N is the batch size and D the depth of the images. In your case this would be the dimension where you stack the frames of your video.