PyTorch Transforms on images/videos with more than 3 channels

Is there an easy way to use PyTorch’s transforms on images/videos with more than 3 channels? Right now I have a video with the following shape (60, 170, 128, 40) i.e. there are 60 frames in the video and each frame has 40 channels to it. It seems like PIL is unable to work with anything that has more than 3 channels.

I currently only need the center crop function and so can directly crop the numpy array but was just wondering if there is a way to use the torchvision transform functions on images that contain more than 3 channels or if I have to script everything.


Have you tried Kornia which supports many different augmentations on both images and tensors.

For instance, your case be achieved this way:

import kornia as K

x = torch.randint(0, 1, (60, 40, 170, 128)).float()
K.center_crop(x, (90, 90)).shape


1 Like

Thanks for introducing me to Kornia!