Bugs with torchvision.transforms.ToPILImage()?

Boyu_Zhang · September 28, 2018, 8:31am

Hi, I was trying to convert an image of tensor into PILImage and then convert it back around by using the following code. But finally, I found the image’s values were modified after the operation tfs.Compose([tfs.ToPILImage(), tfs.ToTensor()]). Is this a bug in torchvision.transforms.ToPILImage() ? Thank you.

from skimage import io, color
from torchvision import transforms as tfs
import numpy as np
import torch


file = 'path/to/image.png'
img = color.rgb2ycbcr(io.imread(file)) / 255
(rows, cols, channel) = img.shape
img_y, img_cb, img_cr = np.split(img, indices_or_sections=channel, axis=2)
tensor_y = torch.from_numpy(img_y).float().view(1, rows, cols)

trans = tfs.Compose([tfs.ToPILImage(), tfs.ToTensor()])
preds = trans(tensor_y)

print((tensor_y.data.numpy()==preds.data.numpy()).all())  # return False

ptrblck · September 28, 2018, 1:46pm

I think you are just losing the accuracy due to quantization.
If you transform your FloatTensor to a PIL.Image, it will be scaled to [0, 255] in uint8 type.
This will already quantize your values, as you cannot map all floating values in [0, 1] to a ByteTensor.
The reverse (ToTensor()) thus yields a FloatTensor with these already quantized values.

lrningml · September 28, 2018, 7:21pm

Just ran into the same thing, what’s a good way to view a tensor as an image without the quantization?

Boyu_Zhang · September 29, 2018, 4:08am

Thanks for your answer. I’ve looked into the function to_pil_image(pic, mode=None) and found it explicitly converts a FloatTensor to [0, 255] in uint8 type. I’m just wondering can we keep the type as float so that there won’t be a precision loss?

Boyu_Zhang · September 29, 2018, 4:12am

Currently, I just convert the tensor into numpy.ndarray and visualize it through matplotlib, or implement some other image transformations to ensure the precision. After that, I convert it back to torch.Tenosr.

monster · November 18, 2020, 12:16pm

My problem is similar, currently I am using this for augmentation.

augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ColorJitter(saturation=0.5),
    transforms.ToTensor(),
    utils.normalize_transform(),
])

so should I remove " transforms.ToTensor()" ?

ptrblck · November 19, 2020, 7:40am

I don’t think the ToTensor transformation is problematic, but the transformation to a uint8 image in ToPILImage() if your use case fits my previous description.
How did you create these floating point images and what do the values represent?