Issue with torchvision.io.video.read_video

ragnaarok · June 2, 2021, 3:03pm

Hi, I’m trying to use torchvision.io.video.read_video but I’m noticing some weird behavior and I wonder if there’s an issue with this method or if I just don’t understand something. Torchvision version is up to date (0.9.1).

Situation: I want to read a 30 seconds clip (audio + video), 25 fps, audio at 44100Hz

I use read_video(pth), everything works fine, outputs shape [750, …], [2, 1323008]
– maths: 25frame * 30sec=750, 44100samples*30sec=1323000
I use read_video(pth, pts_units=‘sec’), same
I use read_video(pth, 0, 10, pts_units=‘sec’), shapes are [251, …], [2, 12]
I use read_video(pth, 0, 12500, pts_units=‘pts’), shapes are [25, …], [2, 13312]

I don’t really understand the two last outputs. For me it should be something like [251, …], [2, 441000], and [25, …], [2, 44100].
I would like to be able to only read parts of the video.
Thanks for any help

ragnaarok · June 2, 2021, 3:13pm

Ok, using

torchvision.set_video_backend('video_reader')

solved my issue for the “sec” unit. I now have [251, …], [2, 441344].
And the “pts” unit seems to be depreciated so it’s not important.
Seems to be solved.
Though it says in the documentation I should compile torchvision from source but I didn’t and it still works, maybe something will be wrong later