Converting video to frames/images for depth prediction

Hello every one, I am new here.
I am using Pytorch for depth prediction through the paper ‘Deeper depth prediction by Lanina et al.’, I have to take video from AR Drone 2.0 and convert that to frames/images and then do the depth prediction on each image taken from the video. Afterwards depth map which is the output of neural network will be fed to RGBD SLAM along with its corresponding RGB image.
Can some one help me with that.


