Training a CNN with videos

I’m pretty knew to computer vision. I’ve followed the PyTorch transfer learning tutorial using fine-tuning to create a CNN (
This tutorial uses images, is there anyway to train a CNN with video data? Or even better, train it with both videos and images?