4D-1D tensor product

jmaronas · February 8, 2018, 9:42am

I want to add noise with mean 0 and std just after batch normalization. The point is I want to add it before the learneable parameters from batch normalization are added/multiply. I want this because I want to ensure that the noise is added when the distribution has 0 mean and 1 standard desviation.

This can be done just passing affine=False into the nn.BatchNorm constructor and then declaring gamma and beta separately. As example

beta=nn.Parameter(torch.zeros(32))
gamma=nn.Parameter(torch.ones(32))

The point is that for a tensor lets say x.shape=(100,32,28,28) (tipical mnist tensor) I have not seen a way that i can multiply each dimension of beta and gamma per all the elements of its channel. The only way (which is quite slow ) is:

y=x.clone() # this is necessary for backward. I cannot operate directly on x
for i in range(32):

    y[:,i]=x[:,i]*gamma[i]+beta[i]`

This is really slow. I have investigated the different methods of torch (from the Blas implementations) or other stuff, I also go deeper into batch norm class but the call to the C or CUDA implementation is directly the batch normalization operation. How can I efficiently do this?

Thanks.

albanD · February 8, 2018, 10:47am

Hi,
You can do the following that makes use of automatic expansion for element wise operations when they have the same number of dimensions:

beta=nn.Parameter(torch.zeros(32))
gamma=nn.Parameter(torch.ones(32))

y = x * gamma.view(1,32,1,1) + beta.view(1,32,1,1)

jmaronas · February 8, 2018, 11:20am

so that operation ensures that each gamma, ie,
gamma[0][i][0][0] is multiplied by all: x[:][i][:][:]

I also thought about that solution but was not really sure if it would operate as i need.

albanD · February 8, 2018, 11:23am

This operations is equivalent to
y = x * gamma.view(1,32,1,1).expand_as(x) + beta.view(1,32,1,1).expand_as(x)
So yes, every x[:,i,:,:] is going to be multiplied by gamma[i]