Torch vision ToPILImage()

Megh_Bhalerao · June 30, 2022, 1:00am

Hi all,

I am trying to convert a real valued tensor to a PIL image to save on disk. When going through the documentation of to_pil_image, I have a question in the following line in the pytorch/vision repo -

github.com

pytorch/vision/blob/87cde716b7f108f3db7b86047596ebfad1b88380/torchvision/transforms/functional.py#L288


      
                  # if 2D image, add channel dimension (HWC)
                  pic = np.expand_dims(pic, 2)
          
          
    # check number of channels
              if pic.shape[-1] > 4:
                  raise ValueError(f"pic should not have > 4 channels. Got {pic.shape[-1]} channels.")
          
          
npimg = pic
          if isinstance(pic, torch.Tensor):
              if pic.is_floating_point() and mode != "F":
                  pic = pic.mul(255).byte()
              npimg = np.transpose(pic.cpu().numpy(), (1, 2, 0))
          
          
if not isinstance(npimg, np.ndarray):
              raise TypeError("Input pic must be a torch.Tensor or NumPy ndarray, not {type(npimg)}")
          
          
if npimg.shape[2] == 1:
              expected_mode = None
              npimg = npimg[:, :, 0]
              if npimg.dtype == np.uint8:
                  expected_mode = "L"

I do not understand why the image is being multiplied by 255, I understand that the 255 represents the standard 8 bit pixel values, but it does not make sense to multiply the image by 255 unless the input tensor values are between [0,1], since there will be overflows above the [0,255] range and in this case I believe PIL wraps the overflowed values like so - val%255. I observed this issue in practice when I am saving a real valued tensor to PIL image using the ToPILImage() utility. Hence if the wrapping happens then, the visualization of images will not be a correct one.
The documentation of TensorToPIL() states that Converts a torch.Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while preserving the value range. - I am not sure what they mean by value range, since from the GitHub code it seems like the value range is not preserved due to some operations on the pixel values. Or am I missing something in my understanding of what preserving value ranges mean? I think the documentation - ToPILImage — Torchvision main documentation - needs to be clearer regarding this utility.
Even the save_image utility in pytorch internally seems to be multiplying the image by 255 here - vision/utils.py at 87cde716b7f108f3db7b86047596ebfad1b88380 · pytorch/vision · GitHub - again under the assumption that the input pixel values are between [0,1], which need not be true. I could not find this assumption stated anywhere explicitly. Please do let me know if I am missing something.
Thanks for your help and please let me know if I am missing something.
– Megh

ptrblck · June 30, 2022, 1:45am

I think your explanation is correct and the default assumption is an input image in the range [0, 1] if you are passing a floating point tensor (e.g. the output of ToTensor using a uint8 image) unless you want to use the F PIL.Image.mode which uses the float32 format.

Megh_Bhalerao · July 9, 2022, 11:17pm

Maybe this could be made clearer in the docs? I would be happy to help and open a PR if needed.
Thanks again!

ptrblck · July 10, 2022, 12:46am

Sure, that sounds great! Feel free to create a GitHub issue first explaining the confusion and the lack of documentation and mention that you would be interested in updating for docs for more clarify as it could be beneficial for others, too!

Megh_Bhalerao · July 11, 2022, 5:34pm

For reference - this issue is taken to the pytorch/vision GitHub - Issue in save_image utility in torch vision.utils · Issue #6255 · pytorch/vision · GitHub