Error for converting numpy images to tensor

torch_fresh · February 24, 2021, 6:34pm

I use the Opencv to read images, and then convert them to tensor to feed DNN for inference. The reason is that I need to use Opencv to handle the images and then DNN inference. However, when I use transforms to preprocess the images. It gives the error: “ValueError: pic should be 2/3 dimensional. Got 4 dimensions.”
The code is

for index in range(len(datasets)):
    img = datasets[index]
    frame = cv2.imread(img)
    frame = frame * (1/255)
    frame = cv2.resize(frame , (224, 224))
    frame = np.transpose(frame , (2, 0, 1))
    if index == 0:
        frames = frame 
        continue
    frames = np.stack((frames , frame))

 transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

frames = transform(frames ).to('cuda')

Then it has the error “ValueError: pic should be 2/3 dimensional. Got 4 dimensions.” The shape of frames is (2, 3, 224, 224). 2 is the number of batches.

I don’t know how to deal with this error. Thank you for any help.

tcapelle · February 24, 2021, 8:06pm

The ToTensor tfm is expecting a channel last image, this works:

transform(np.random.random((224,244, 3)))

it is also not expecting a batch. From the dos:

Convert a PIL Image or numpy.ndarray to tensor.

It expects one PIL image.

Eta_C · March 1, 2021, 6:50am

Let me show some examples

example 1: use OpenCV and do transforms for each image

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
frames = []
for index in range(2):
    # read image
    frame = (np.random.rand(256, 256, 3) * 255).astype(np.uint8)

    # resize image
    frame = frame * (1/255)
    frame = cv2.resize(frame, (224, 224))
    frame = transform(frame)
    if index == 0:
        frames = [frame]
        continue
    frames.append(frame)

frames = torch.stack(frames)

example 2: use OpenCV and do transform after collecting all images

frames = []
for index in range(2):
    # read image
    frame = (np.random.rand(256, 256, 3) * 255).astype(np.uint8)

    # resize image
    frame = frame * (1/255)
    frame = cv2.resize(frame, (224, 224))
    frame = np.transpose(frame, (2, 0, 1))
    if index == 0:
        frames = [frame]
        continue
    frames.append(frame)

frames = np.stack(frame, axis=0)
# transform
mean, std = torch.tensor([0.485, 0.456, 0.406]), torch.tensor([0.229, 0.224, 0.225])
frames = (torch.from_numpy(frames) - mean[None, :, None, None]) / std[None, :, None, None]