Hello,
i’ve been trying to use the r3d_18 pretrained model from pytorch :
but i keep having this error :
RuntimeError: Given groups=1, weight of size [64, 3, 3, 7, 7], expected input[1, 1, 3, 112, 112] to have 3 channels, but got 1 channels instead
Yet, from what i understand from the error, my input which is BTCHW format has 3 channels.
Here is the code that leads to this error :
import cv2
import torch
from torchvision.models.video import r3d_18
import torchvision.transforms as transforms
Load the pretrained R3D-18 model
model = r3d_18(pretrained=True)
Set the model to evaluation mode
model.eval()
Open the video file
video_capture = cv2.VideoCapture(‘pytorch_resnet/videos/cam_int_grab_2.mp4’)
while True:
# Read a frame from the video
ret, frame = video_capture.read()
if not ret:
break
# Ensure the frame has 3 color channels (RGB)
if frame.shape[-1] != 3:
raise ValueError("Input frame should have 3 color channels (RGB)")
# Resize and central crop
resize = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((128, 171)),
transforms.CenterCrop((112, 112)),
transforms.ToTensor(),
])
# Normalize
normalize = transforms.Compose([
transforms.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
])
# Preprocess the frame
frame = cv2.resize(frame, (171, 128)) # Resize to 171x128
frame = resize(frame) # Resize and central crop
frame = normalize(frame) # Normalize
# Add batch dimension
frame = frame.unsqueeze(0) # Shape will be [1, 3, 112, 112]
# Check input tensor shape and data type
print(f"Input tensor shape: {frame.shape}, data type: {frame.dtype}")
# Perform inference
with torch.no_grad():
output = model(frame) #ERROR HERE
Note : i’ve followed r3d documentation for the preprocessing of the frames.