How to normalize sequence input data

gilad · November 19, 2017, 3:20pm

Hi

What is the recommended way to normalize sequence input data (3d tensors).
Should I use batchnorm layer? create my own transform on Dataloader?
Is there any example of normalizing sequence data which is not word embedding?

Thanks!
Gilad

smth · November 20, 2017, 5:20am

What kind of normalization are you thinking of applying?

You can compute the torch.mean / torch.std over the sequence and center it by subtracting / dividing…

Royi · November 20, 2017, 5:28am

That makes me wonder.
Let’s say the input sample is $ x $.
So we feed the net with $ \frac{x - m}{v} = \frac{x}{v} - \frac{m}{v} $.
Now, if the first layer of the net is Linear Layer we have the Bias term (Assuming it is turned on).
If there is need to subtract a value from all samples, like $ \frac{m}{v} $, I’d assume it will be learned.

So I see the point in dividing, but what’s the point in centering?

smth · November 20, 2017, 5:39am

yes the centering “might” be learnt by the bias term, if all stars align (and your biases are closely initialized). Giving a centering prior based on dataset statistics just helps. Scaling can also technically be learnt by the convolution operation (the weights can learn to scale up or down), but whether the weights can learn good scaling depends on initialization, how activation dynamic range changes over the network depth etc.

Royi · November 20, 2017, 6:32am

Well, I’m not experience in Deep Learning.
The question is if the Bias term is indeed (For centered data sets) approaches zero.
I will check on MNIST just for self knowledge.

billtubbs · December 13, 2019, 12:48am

I have the same question. Can’t understand why there aren’t more examples of normalizing the inputs (and outputs potentially). Looking at torchvision.transforms.Normalize it says it is for normalizing “a tensor image with mean and standard deviation” which I don’t think is the same as what we’re talking about here.

In Scikit-Learn you simply add a sklearn.preprocessing.StandardScaler to your pipeline and it normalizes your dataset before training starts. As far as I can see there are no ‘built-in’ ways to do this in PyTorch or am I missing something?

Are there any options other than these:

Manually calculate the mean and std deviation of your data and convert it at the beginning and keep a record of those parameters.
Build above into your torch.utils.data.dataset if you are using a custom dataset and data loader.
For output normalization, attach the parameters to your model and build the conversion back to original units into your models forward() method (remembering to feed un-normalized target values to your test criterion).
Or, do all your predictions and test evaluations in normalized units and only convert them back when you finally plot/save the results.
Or import and use sklearn.preprocessing.StandardScaler?