Pytorch 1.7.0 new torchvision API questions

I just upgraded to pytorch 1.7.0 with cuda11.1 and cudnn8.0.5.

I was reading the documentation on the fine grained API and got very interested but have a few questions.

  1. Looking at the torchvision. set_video_backend (backend ) function and the options. Is there a way to determine whether the ffmpeg backend will use GPU or CPU? I am looking at using the nvidia GPU based video decoding and it was not clear how to do this.
  2. In this documentation:
    Can the video source be an rtsp stream (uri) instead of a file name?
  3. What format would the frame extracted from the video be? I suppose it would be a tensor?

My current deployment makes use of video inference and I am using openCV’s video API to create multiple video streams from various cameras, extract a frame (as an opencv formated numpy array), convert to a tensor and then forward into models. I am looking to see if I can replace the opencv API with torchvision and eliminate a few conversion steps… basically extract the frame and keep it in the GPU memory as a tensor and infer directly from it instead of doing video
stream GPU decoding(opencv ffmpeg) -> frame extraction to CPU (opencv) -> numpy array (pytorch) -> CPU conversion to tensor (pytorch)-> GPU inference (pytorch).

Video stream GPU decoding (torchvision) -> frame extraction in GPU (torchvision) -> GPU inference (pytorch)

Is this possible?

1 Like

I’m interested in exactly the same thing.

However, if this is possible, and you’re in a multiprocessed environment, I’m afraid you’re gonna have big spikes of GPU memory (every process has to load at least 1GB of cuda kernels to use torch GPU, and that’s not even talking about the space to store the frames you’re gonna be manipulating). Ignore if you’re not though.

Anyways, I’m currently looking at this GitHub - osai-ai/tensor-stream: A library for real-time video stream decoding to CUDA memory. I haven’t tested it yet. If it’s what you’re looking for, please share your results. Or if you found another way, please keep me posted.

PS : I suggest you could change the name of the topic to be more on point with what you are describing. What about “Video loading and preprocessing fully on GPU” or something like that ?

Thanks Felix,

I did explore the option of using tensor-stream as well but ended up giving up on the idea as it doesn’t appear to be all that efficient GPU memory wise. The saving made from not moving the frames from GPU to system RAM back and forth actually would be loss in some of the image manipulations I am having to make to them which, it seems are faster on the CPU and more memory consuming.