Why are the detected result on images better than video or stream with yolov5 in the same frame?

I deploy YOLOv5 Object Detection Model with OpenCV. I got bad results when I used it on a video but then I screenshot of that video and used a model on that screenshot picture, and I got a good result. Why are the detected result on images better than video or stream in the same frame? How should I do it better?

I would recommend to check the processing of the frames and make sure the same normalization etc. is applied to the video stream as well as the single input image. Storing a video frame and comparing it to the single frame might also be a good idea to check the value ranges etc.