Training a CNN with videos

Hi everyone,
I’m pretty knew to computer vision. I’ve followed the PyTorch transfer learning tutorial using fine-tuning to create a CNN (https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)
This tutorial uses images, is there anyway to train a CNN with video data? Or even better, train it with both videos and images?
Thanks