MTCNN- PyTorch processing time in still images and videos

I’m working to implement the pre-trained model of MTCNN-PyTorch available here GitHub - timesler/facenet-pytorch: Pretrained Pytorch face detection (MTCNN) and facial recognition (InceptionResnet) models in NVIDIA jetson device. however although the code runs using CUDA, testing it in still images outputs processing time around 3-4 seconds per image. i tried to test in videos the processing time varied between 1-5FPS. Is there any explanation to this variation of time like why still images take longer than videos, and why videos can vary between 1 - 5FPS although all are in the same size and quality.