I use the Opencv to read images, and then convert them to tensor to feed DNN for inference. The reason is that I need to use Opencv to handle the images and then DNN inference. However, when I use transforms to preprocess the images. It gives the error: “ValueError: pic should be 2/3 dimensional. Got 4 dimensions.”
The code is
for index in range(len(datasets)):
img = datasets[index]
frame = cv2.imread(img)
frame = frame * (1/255)
frame = cv2.resize(frame , (224, 224))
frame = np.transpose(frame , (2, 0, 1))
if index == 0:
frames = frame
continue
frames = np.stack((frames , frame))
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
frames = transform(frames ).to('cuda')
Then it has the error “ValueError: pic should be 2/3 dimensional. Got 4 dimensions.” The shape of frames is (2, 3, 224, 224). 2 is the number of batches.
I don’t know how to deal with this error. Thank you for any help.
tcapelle
(Thomas Capelle)
February 24, 2021, 8:06pm
2
torch_fresh:
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
The ToTensor
tfm is expecting a channel last image, this works:
transform(np.random.random((224,244, 3)))
it is also not expecting a batch. From the dos:
Convert a PIL Image
or numpy.ndarray
to tensor.
It expects one PIL image.
Eta_C
March 1, 2021, 6:50am
3
Let me show some examples
example 1: use OpenCV and do transforms for each image
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
frames = []
for index in range(2):
# read image
frame = (np.random.rand(256, 256, 3) * 255).astype(np.uint8)
# resize image
frame = frame * (1/255)
frame = cv2.resize(frame, (224, 224))
frame = transform(frame)
if index == 0:
frames = [frame]
continue
frames.append(frame)
frames = torch.stack(frames)
example 2: use OpenCV and do transform after collecting all images
frames = []
for index in range(2):
# read image
frame = (np.random.rand(256, 256, 3) * 255).astype(np.uint8)
# resize image
frame = frame * (1/255)
frame = cv2.resize(frame, (224, 224))
frame = np.transpose(frame, (2, 0, 1))
if index == 0:
frames = [frame]
continue
frames.append(frame)
frames = np.stack(frame, axis=0)
# transform
mean, std = torch.tensor([0.485, 0.456, 0.406]), torch.tensor([0.229, 0.224, 0.225])
frames = (torch.from_numpy(frames) - mean[None, :, None, None]) / std[None, :, None, None]