I’m not sure whether I’m normalizing images properly in the right dimension before feeding it into an autoencoder in PyTorch.
Dataset:
I’m using the Omniglot dataset, which contains 20 samples each of 964 characters and each character is 105 * 105 * 1 (greyscale).
Before I feed the training data into the autoencoder, it has this shape:
Xtrain.shape = (964, 20, 105, 105)
How I’m currently calculating mean and std and applying the transform:
class OmniglotDataset(Dataset):
def __init__(self, X, transform=None):
self.X = X.reshape(-1, 105, 105)
self.transform = transform
def __len__(self):
return self.X.shape[0]
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
img = self.X[idx]
if self.transform:
img = self.transform(img)
return img
X1 = Xtrain.reshape(-1, 105, 105)
mean = X1.mean(axis=0).mean()
std = X1.mean(axis=0).std()
batch_size = 128
img_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((mean,), (std,))
])
train_dataset = OmniglotDataset(Xtrain, transform=img_transform)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
Questions:
- Am I applying normalization correctly here?
- Is there a faster way to calculate mean and std?