Standardization of data

maxest · April 25, 2018, 2:46pm

I have a 2D tensor which I want to standardize. Each row contains an instance, and each instance is an array of 400 floats. I want to efficiently use mean/std functions to get means/stds of all those instances speparately, and then use them to standardize my data.

So far I was able (I think) to get means and stds of all instances with this:

means = train_input_data.mean(dim=1)
stds = train_input_data.std(dim=1)

But I don’t know how to apply subtraction and division of that data on all instances. I can do it on one:

train_input_patches[0] = (train_input_patches[0] - means[0]) / stds[0]

but that doesnt’t seem to be an optimal way to make a loop through all instances.

arturml · April 25, 2018, 5:15pm

EDIT: Sorry about the last question, PyTorch supports broadcasting like NumPy, you just have to keep the dimension:

means = train_data.mean(dim=1, keepdim=True)
stds = train_data.std(dim=1, keepdim=True)
normalized_data = (train_data - means) / stds

maxest · April 26, 2018, 2:47pm

Worked like a charm! Thanks.

Ben · September 6, 2018, 1:52am

Thanks for the code!