I load video frame by frame with
from torchvision.io.video import read_video
v, _, _ = read_video(video_path, pts_unit='sec')
Because I use each frame of video to predict on model train with image, I need to normalize.
transform = transforms.Compose([
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
frame = transform(frame)
However, as video return uint8, it cannot normalize
ValueError: std evaluated to zero after conversion to torch.uint8, leading to division by zero.
I need to cast to float.
transform = transforms.Compose([
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
frame = frame.float()
frame = transform(frame)
do the type cast here is the correct way to do so in this video case ?
For image case, I load with PIL and use transforms.ToTensor()
so I don’t have to worry about int.