Problems about Loading & Processing Multi-bands Data - pytorch

I need to load some multi-bands images to the CNN via PyTorch.
Each image has more than ten channels.
So how to do the image transformation and augmentation? just like the normal RGB images, which can be processed by the package ‘torchvision.transforms’.
Is there some existed packages that can deal with multi-bands data?
If not, how to do the RandomCrop to each image? (every band of this image needs to be randomly cropped in the same location)

Most of torchvision's transformations use PIL internally. I’m not sure, if PIL (or a substitute) can handle 10-channel images, so you could instead apply each transformation channel-wise, use other libraries (e.g. numpy or some scipy package) or write the transformations manually in PyTorch.
Which transformations do you need?

Yeah, I am preparing the codes for the channel-wise transformations.
I will use the Resize, RandomCrop, Normalize, Horizontal/VerticalFlip…
The trouble is how to make each channel with the same transformation. For example, crop the same area in each channel; make each channel do the same flip…

To apply the same random transformations, you should use the functional API. Have a look at this post for an example.

Thank you very much. Each channel data has different ranges, such as (0,1) or (0, 17).
So when we do the training/testing of neural network, should we normalize each channel into the same range?
If so, maybe I need to use the function “(input - mean) / (std)” ? But this function also can not normalize each channel into the same range.
Should I firstly normalize the data of each channel into the range of (0, 1), and then use “(input - mean) / (std)” to make the data into the range of (-1, 1) ?

mean and std are usually containing values for each channel, so that each channel will be normalized separately.
If you are dealing with e.g. 10 channels, mean and std should be a tensor containing 10 values.

Yes, you’re right. If there are 10 channels, mean and std should be a tensor containing 10 values.
But this normalization will not change each channel data into the same range, such as (-1, 1) or (0, 1).
Is this ok for the pytorch train/test?

Because for the common RGB data, when we do normalization, we need firstly do transforms.ToTensor(), so each channel data can be normalized into the range of (0,1).
Then if we use the transforms.Normalize(), each channel data can be normalized into the range of (-1,1).
If my 10-channel data is not in the range of (-1,1) after the transforms operation, is it ok?

After normalization the standard deviation of your data will be 1. The values are not necessarily in the range [-1, 1]. What is the range of your original data? Are you working with uint8 values?

Every channel data has different ranges, such as, the data of some channels is between (0,1), and the data of some channels is (0, 17).
The data is float type.

In that case, PIL should probably handle the data successfully using the image mode F.

See libvips. Although it is based on C++, it has vips binding for Python.

2 Likes

Thanks for your suggestions.
I have manually written the transformations to process every channel via the ‘for loop’. It can handle the 10-channel data by the transformations of Resize, RandomCrop, RandomHorizontalFlip, RandomVerticalFlip, ToTensor and Normalize.
But the processing speed is slow, which is because of the ‘for loop’ in my opinion. Is it possible to speed up the process?

Thank you very much. I’ll learn to use it.
Currently, I have written codes of transformations to manually process every channel data via the ‘for loop’. Now, it can handle the 10-channel data by the transformations of Resize, RandomCrop, RandomHorizontalFlip, RandomVerticalFlip, ToTensor and Normalize.
But the processing speed is slow, which is because of the ‘for loop’ in my opinion. Is it possible to speed up the process?

The link above, vips, claims to have the fastest implementation compared to all other libraries.
In addition to Python, they also have a Lua version.