Am I applying image normalization correctly here?

I’m not sure whether I’m normalizing images properly in the right dimension before feeding it into an autoencoder in PyTorch.


I’m using the Omniglot dataset, which contains 20 samples each of 964 characters and each character is 105 * 105 * 1 (greyscale).

Before I feed the training data into the autoencoder, it has this shape:

Xtrain.shape = (964, 20, 105, 105)

How I’m currently calculating mean and std and applying the transform:

class OmniglotDataset(Dataset):

    def __init__(self, X, transform=None):
        self.X = X.reshape(-1, 105, 105)
        self.transform = transform
    def __len__(self):
        return self.X.shape[0]
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        img = self.X[idx]
        if self.transform:
            img = self.transform(img)
        return img

X1 = Xtrain.reshape(-1, 105, 105)
mean = X1.mean(axis=0).mean()
std = X1.mean(axis=0).std()
batch_size = 128
img_transform = transforms.Compose([
    transforms.Normalize((mean,), (std,))

train_dataset = OmniglotDataset(Xtrain, transform=img_transform)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)


  1. Am I applying normalization correctly here?
  2. Is there a faster way to calculate mean and std?