How to save images that have more than 3 channels

I am using torch.utils.save_image() to save an image that has 5 channels. The image is formed by concatenating 12 images together across dim 3. The current dimensions of the image are (16, 5, 128, 1536) with 16 being the batch size.

Code:

r = self.denorm(x_concat.data.cpu())
save_image(r, sample_path, nrow=1, padding=0)

The code is giving this error:

Traceback (most recent call last):
  File "main.py", line 137, in <module>
    main(config)
  File "main.py", line 48, in main
    solver.train()
  File "/solver.py", line 366, in train
    save_image(r, sample_path, nrow=1, padding=0)
  File "/torchvision/utils.py", line 129, in save_image
    im = Image.fromarray(ndarr)
  File "/PIL/Image.py", line 2751, in fromarray
    raise TypeError("Cannot handle this data type: %s, %s" % typekey) from e
TypeError: Cannot handle this data type: (1, 1, 5), |u1

I am assuming the error is due to the channels being 5 since this worked previously for a set of images that had channels = 3.

pytorch version = 1.7.0
python version = 3.7.10

I don’t think a native image format using 5 channels exists, so you would not be able to store this type of data as an image. I don’t know what your data represents, but you could store the tensor directly via torch.save.

1 Like

I have a similar issue that I will bring up in this older post.

I am using a Unet architecture with hyperspectral data of up to 200 channels. I am using the Unet to reduce the number of bands need for segmentation. Up to 16 class labels are provided and I have used one-hot encoding to train my model.

The resultant tensor from my decoder is of shape [2, 16, 128, 128] where 16 is the one-hot encoding and 128x128 is obviously the image size. I cannot figure out how to save the image itself from a tensor in a grayscale format.

The following are things I have tried to resolve this:

  1. Take the max value along the channel dimension (axis=1) then save the image. Results in a tensor of shape [2, 128, 128] and I get RuntimeError: result type Float can’t be cast to the desired output type __int64

  2. To fix the data type and change the tensor size to [2, 1, 128, 128] as I would expect for a grayscale conversion, I tried torch.reshape(decoded_imgs, (2,1,128,128)).type(torch.long) and got the same error as before.

  3. Using a different transformation with torchvision.transforms.Grayscale()(decoded_img) I received this: TypeError: Input image tensor permitted channel values are [1, 3], but found 16.

  4. Since the Grayscale() function only accepts channels of size 1 or 3, I performed steps 1 and 2 again then 3 and got the same error shown in step 1.

Any guidance is greatly appreciated.

If the output of your model is a tensor of shape [2, 16, 128, 128] (batch_size, output_channels, height, width) my question is why you need 16 channels?
Anyway, if the max along the channel dimension original_tensor.max(dim=1, keepdim=True) is what you need, but you get that runtime error, maybe your output is a logits tensor.
You can try to add a sigmoid function and then a discretization one:

logits = self.forward(image)
prob_img = logits_mask.sigmoid()
pred_img = (prob_img > 0.5).int()

At this point you should have a binary tensor