Input image changes after converting it back from torch.Tensor

Hello there, I can’t seem to get my input RGB image back corectly after converting it to tensor, it seems that all 9 smaller pictures differ from each other slightly.
Code to reproduce the problem

rgb_img = cv2.resize(rgb_img, dsize = (160,120),interpolation=cv2.INTER_CUBIC)
#(120, 160, 3)
cv2.imwrite("rgb_original.jpg",rgb_img)
rgb_img = np.rollaxis(rgb_img, 2)
#(3, 120, 160)
rgb_img = torch.from_numpy(rgb_img)
rgb_img = rgb_img.type(torch.IntTensor)
rgb_img = rgb_img.to(device="cpu")
rgb_img= rgb_img.unsqueeze(0)# this is just to reproduce the same batch dimension of dataloader
#torch.Size([1, 3, 120, 160])
rgb_img = rgb_img.reshape(rgb_img.shape[2], rgb_img.shape[3], rgb_img.shape[1]).cpu().numpy()
cv2.imwrite("rgb_.jpg",rgb_img)
# #print(rgb_img)

Hi,
Your method is almost OK. You forgot to think about the permutation of the axes.
I am giving two methods to save your image.

import cv2
import numpy as np
import torch
import torchvision.utils as vutils

# reading image
rgb_img = cv2.imread('1.jpg')  # read any image
rgb_img = cv2.resize(rgb_img, dsize=(160, 120), interpolation=cv2.INTER_CUBIC)

# saving original image
cv2.imwrite("rgb_original.jpg", rgb_img)

# using torchvision to save image <- first method
rgb_img_tensor = torch.from_numpy(rgb_img).float() / 255.  # torchvision uses [0,1] range
rgb_img_tensor = rgb_img_tensor.unsqueeze(0).permute(0, 3, 1, 2)
rgb_img_tensor = rgb_img_tensor[:, [2, 1, 0], :, :]  # cv2 reads image in BGR, reversing it to RGB
vutils.save_image(rgb_img_tensor, "first_method.jpg")

# using your method to save image <- second method
rgb_img_orig = np.rollaxis(rgb_img, 2)
rgb_img_orig = torch.from_numpy(rgb_img_orig)
rgb_img_orig = rgb_img_orig.type(torch.IntTensor)
rgb_img_orig = rgb_img_orig.unsqueeze(0)
shape = rgb_img_orig.shape
rgb_img_orig = rgb_img_orig.permute(0, 2, 3, 1).reshape(shape[2], shape[3], shape[1]).cpu().numpy() #  <- permute axes here
cv2.imwrite("second_method.jpg", rgb_img_orig)

:slight_smile:

1 Like

Worked like a charm =) Thanks!
I resorted at some point to transform it back to a PIL Image which was not the prettiest way, here it is anyway x.x

from PIL import Image
import torchvision.transforms
import torchvision.transforms.functional as TF
rgb_img = TF.to_tensor(rgb_img) # rescale it 0<..<1
rgb_img = rgb_img.type(torch.uint8).squeeze(0)
#rgb_img = rgb_img.type(torch.FloatTensor)
img = torchvision.transforms.ToPILImage()(rgb_img)
im = Image.fromarray(img)
img.save("metamorphosis.jpeg")
1 Like

on an another note, is it not a good an idea to train a one class detection model with RGB values or I should follow the good practice of scaling and normalizing my tensors?

Hi,
Glad it worked!
I am not sure about what you mean by scaling and normalization. If you are asking about scaling the values in [0,1] range by dividing RGB values with 255 (pixel value max), then yes, you should use scaling. It will help the model with faster convergence.

I generally don’t use normalization very often as it tends to hurt color consistency in image generation task (I work with image inpainting task). But I have seen some strong arguments for normalization for classification tasks. Most of the normalization values are from ImageNet and work well with most of the images.

I would suggest going through some well-known classification architectures to get some pointers on input processing.

Maybe, you can start from here .

:slight_smile:

1 Like

Thanks, I will make sure to train my training abilities x) I will check out the ImageNet norm values and try them out, see if I get better results for my nnet. :yum:

1 Like