Grayscale Image transpose parameters for pytorch

Dear Concern,
What will be the transpose parameter for the grayscale image for conv2d?

training_data = []

def create_training_data():
    for category in CATEGORIES_Train:  # do dogs and cats

        path = os.path.join(DATADIR_Train,category)  # create path attack
        class_num = CATEGORIES_Train.index(category)  # get the classification  (0, 1,.... ).

        for img in tqdm(os.listdir(path)):  # iterate over each image 
            try:
                img_array = cv2.imread(os.path.join(path,img))  # convert to array
                new_array = cv2.resize(img_array, (100, 100))  # resize to normalize data size
                new_array = np.transpose(new_array, (2, 0, 1))
                training_data.append([new_array, class_num])  # add this to our training_data
            except Exception as e:  # in the interest in keeping the output clean...
                pass
            #except OSError as e:
            #    print("OSErrroBad img most likely", e, os.path.join(path,img))
            #except Exception as e:
            #    print("general exception", e, os.path.join(path,img))

create_training_data()

print(len(training_data))```

![image|690x28](upload://nJGZ8jjBcygJ7ZeiQxBYoCFk3HO.png)

I guess the current code is permuting the dimensions from HWC to CHW?
If so, you could reuse the same code assuming your grayscale image uses a channel dimension with the size 1 or remove it in case the image only has two dimensions HW.

For RGB images this transpose concept is not clear to me, If I remove this line it gives the error about it got 100 channels. So how does this transposition make the image in 3 channels? In your other replay to another user, you mentioned alpha and what is the alpha dimension.

I really appreciate your help, for you my concept become more clear.

new_array = np.transpose(new_array, (2, 0, 1))

np.transpose permutes the dimensions in the same way torch.permute works. In your use case it is unrelated to the alpha channel, as this would be an additional 4th channel making the input RGBA.
As explained, the code permutes the dimensions from [H, W, C] (channels-last) to [C, H, W] (channels-first).

1 Like

@ptrblck Thanks I understood.