Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
So if I do the normalization on each channel by myself, to convert [a,b] to [0,1], I don’t need transforms.ToTensor anymore, right?
But what if my data has a different range of each channel, such as x: -10 ~ 10, y: 1 -100, z: 20 -25 (actually they have some hidden correlation between each other), how to normalization? It doesn’t make sense to normalize them to the same range.
Can you please these explanations as probably a footnote in the tutorials? In its current form it seems too intimidating to see constants popping without proper explanation. Great work BTW.
I guess that depends on the activation function(s) used. If you are using Sigmoid, then you are better off with [0, 1] normalization, else if you are using Tan-Sigmoid then [-1, 1] normalization will do. The normalization might, in many occasions, affect the time your network needs to converge; as the synaptic weights will adapt to the situation with time.
To anybody looking for a more universal solution for custom datasets, this is what worked for me:
# Note: data type must be numpy.ndarray
# example of data shape: (50000, 32, 32, 3). Channel is last dimension
data = training_set.data
# find mean and std for each channel, then put it in the range 0..1
mean = np.round(data.mean(axis=(0,1,2))/255,4)
std = np.round(data.std(axis=(0,1,2))/255,4)
print(f"mean: {mean}\nstd: {std}")
You are directly indexing the internal .data attribute which contains the entire unprocessed samples.
If you want to apply the transformations you would need to index ir iterate the train_set e.g. via train_set[0].min().