Am I applying image normalization correctly here?

I’m not sure whether I’m normalizing images properly in the right dimension before feeding it into an autoencoder in PyTorch.

Dataset:

I’m using the Omniglot dataset, which contains 20 samples each of 964 characters and each character is 105 * 105 * 1 (greyscale).

Before I feed the training data into the autoencoder, it has this shape:

Xtrain.shape = (964, 20, 105, 105)

How I’m currently calculating mean and std and applying the transform:

class OmniglotDataset(Dataset):

    def __init__(self, X, transform=None):
        self.X = X.reshape(-1, 105, 105)
        self.transform = transform
    
    def __len__(self):
        return self.X.shape[0]
    
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        img = self.X[idx]
        if self.transform:
            img = self.transform(img)
        return img

X1 = Xtrain.reshape(-1, 105, 105)
mean = X1.mean(axis=0).mean()
std = X1.mean(axis=0).std()
batch_size = 128
img_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((mean,), (std,))
])

train_dataset = OmniglotDataset(Xtrain, transform=img_transform)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

Questions:

  1. Am I applying normalization correctly here?
  2. Is there a faster way to calculate mean and std?