R3D Expected channels error

Hello,

i’ve been trying to use the r3d_18 pretrained model from pytorch :

https://pytorch.org/vision/main/models/generated/torchvision.models.video.r3d_18.html#torchvision.models.video.r3d_18

but i keep having this error :

RuntimeError: Given groups=1, weight of size [64, 3, 3, 7, 7], expected input[1, 1, 3, 112, 112] to have 3 channels, but got 1 channels instead

Yet, from what i understand from the error, my input which is BTCHW format has 3 channels.

Here is the code that leads to this error :

import cv2
import torch
from torchvision.models.video import r3d_18
import torchvision.transforms as transforms

Load the pretrained R3D-18 model

model = r3d_18(pretrained=True)

Set the model to evaluation mode

model.eval()

Open the video file

video_capture = cv2.VideoCapture(‘pytorch_resnet/videos/cam_int_grab_2.mp4’)

while True:
# Read a frame from the video
ret, frame = video_capture.read()
if not ret:
break

# Ensure the frame has 3 color channels (RGB)
if frame.shape[-1] != 3:
    raise ValueError("Input frame should have 3 color channels (RGB)")

    # Resize and central crop
resize = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((128, 171)),
    transforms.CenterCrop((112, 112)),
    transforms.ToTensor(),
])

# Normalize
normalize = transforms.Compose([
    transforms.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
])

# Preprocess the frame
frame = cv2.resize(frame, (171, 128))  # Resize to 171x128
frame = resize(frame)  # Resize and central crop
frame = normalize(frame)  # Normalize

# Add batch dimension
frame = frame.unsqueeze(0)  # Shape will be [1, 3, 112, 112]

# Check input tensor shape and data type
print(f"Input tensor shape: {frame.shape}, data type: {frame.dtype}")

# Perform inference
with torch.no_grad():
    output = model(frame)   #ERROR HERE

Note : i’ve followed r3d documentation for the preprocessing of the frames.