I just upgraded to pytorch 1.7.0 with cuda11.1 and cudnn8.0.5.
I was reading the documentation on the torchvision.io fine grained API and got very interested but have a few questions.
- Looking at the
torchvision.
set_video_backend
(backend ) function and the options. Is there a way to determine whether the ffmpeg backend will use GPU or CPU? I am looking at using the nvidia GPU based video decoding and it was not clear how to do this. - In this documentation: https://pytorch.org/docs/stable/torchvision/io.html#video
Can the video source be an rtsp stream (uri) instead of a file name? - What format would the frame extracted from the video be? I suppose it would be a tensor?
My current deployment makes use of video inference and I am using openCV’s video API to create multiple video streams from various cameras, extract a frame (as an opencv formated numpy array), convert to a tensor and then forward into models. I am looking to see if I can replace the opencv API with torchvision and eliminate a few conversion steps… basically extract the frame and keep it in the GPU memory as a tensor and infer directly from it instead of doing video
stream GPU decoding(opencv ffmpeg) -> frame extraction to CPU (opencv) -> numpy array (pytorch) -> CPU conversion to tensor (pytorch)-> GPU inference (pytorch).
Ideally:
Video stream GPU decoding (torchvision) -> frame extraction in GPU (torchvision) -> GPU inference (pytorch)
Is this possible?