 I need to load some multi-bands images to the CNN via PyTorch.
Each image has more than ten channels.
So how to do the image transformation and augmentation? just like the normal RGB images, which can be processed by the package ‘torchvision.transforms’.
Is there some existed packages that can deal with multi-bands data?
If not, how to do the RandomCrop to each image? (every band of this image needs to be randomly cropped in the same location)

Most of `torchvision`'s transformations use `PIL` internally. I’m not sure, if `PIL` (or a substitute) can handle 10-channel images, so you could instead apply each transformation channel-wise, use other libraries (e.g. numpy or some scipy package) or write the transformations manually in PyTorch.
Which transformations do you need?

Yeah, I am preparing the codes for the channel-wise transformations.
I will use the Resize, RandomCrop, Normalize, Horizontal/VerticalFlip…
The trouble is how to make each channel with the same transformation. For example, crop the same area in each channel; make each channel do the same flip…

To apply the same random transformations, you should use the functional API. Have a look at this post for an example.

Thank you very much. Each channel data has different ranges, such as (0,1) or (0, 17).
So when we do the training/testing of neural network, should we normalize each channel into the same range?
If so, maybe I need to use the function “(input - mean) / (std)” ? But this function also can not normalize each channel into the same range.
Should I firstly normalize the data of each channel into the range of (0, 1), and then use “(input - mean) / (std)” to make the data into the range of (-1, 1) ?

`mean` and `std` are usually containing values for each channel, so that each channel will be normalized separately.
If you are dealing with e.g. 10 channels, `mean` and `std` should be a tensor containing 10 values.

Yes, you’re right. If there are 10 channels, `mean` and `std` should be a tensor containing 10 values.
But this normalization will not change each channel data into the same range, such as (-1, 1) or (0, 1).
Is this ok for the pytorch train/test?

Because for the common RGB data, when we do normalization, we need firstly do transforms.ToTensor(), so each channel data can be normalized into the range of (0,1).
Then if we use the transforms.Normalize(), each channel data can be normalized into the range of (-1,1).
If my 10-channel data is not in the range of (-1,1) after the transforms operation, is it ok?

After normalization the standard deviation of your data will be 1. The values are not necessarily in the range `[-1, 1]`. What is the range of your original data? Are you working with `uint8` values?

Every channel data has different ranges, such as, the data of some channels is between (0,1), and the data of some channels is (0, 17).
The data is float type.

In that case, `PIL` should probably handle the data successfully using the image mode `F`.

See libvips. Although it is based on C++, it has vips binding for Python.

1 Like