I want to add noise with mean 0 and std just after batch normalization. The point is I want to add it before the learneable parameters from batch normalization are added/multiply. I want this because I want to ensure that the noise is added when the distribution has 0 mean and 1 standard desviation.
This can be done just passing affine=False into the nn.BatchNorm constructor and then declaring gamma and beta separately. As example
beta=nn.Parameter(torch.zeros(32))
gamma=nn.Parameter(torch.ones(32))
The point is that for a tensor lets say x.shape=(100,32,28,28) (tipical mnist tensor) I have not seen a way that i can multiply each dimension of beta and gamma per all the elements of its channel. The only way (which is quite slow ) is:
y=x.clone() # this is necessary for backward. I cannot operate directly on x
for i in range(32):
y[:,i]=x[:,i]*gamma[i]+beta[i]`
This is really slow. I have investigated the different methods of torch (from the Blas implementations) or other stuff, I also go deeper into batch norm class but the call to the C or CUDA implementation is directly the batch normalization operation. How can I efficiently do this?
Thanks.