If I want to add some zero-centered Gaussian noise,it only active in training process. Dose pytorch has this function? Keras has it(noise layer in Keras)
torch.randn generates noise
Here is my code. Hope it helps.
The known issue is it slower my training process about 25%.
class GaussianNoise(nn.Module):
def __init__(self, stddev):
super().__init__()
self.stddev = stddev
def forward(self, din):
if self.training:
return din + torch.autograd.Variable(torch.randn(din.size()).cuda() * self.stddev)
return din
Here is what I normally use: the major difference being that I do not pass the noise to GPU at every call which should speed things up
class GaussianNoise(nn.Module):
"""Gaussian noise regularizer.
Args:
sigma (float, optional): relative standard deviation used to generate the
noise. Relative means that it will be multiplied by the magnitude of
the value your are adding the noise to. This means that sigma can be
the same regardless of the scale of the vector.
is_relative_detach (bool, optional): whether to detach the variable before
computing the scale of the noise. If `False` then the scale of the noise
won't be seen as a constant but something to optimize: this will bias the
network to generate vectors with smaller values.
"""
def __init__(self, sigma=0.1, is_relative_detach=True):
super().__init__()
self.sigma = sigma
self.is_relative_detach = is_relative_detach
self.noise = torch.tensor(0).to(device)
def forward(self, x):
if self.training and self.sigma != 0:
scale = self.sigma * x.detach() if self.is_relative_detach else self.sigma * x
sampled_noise = self.noise.repeat(*x.size()).normal_() * scale
x = x + sampled_noise
return x
Hey @YannDubs1,
Thanks for sharing the code, I am curious to know how can I use your code as a part of my implementation.
My model has 2 part: Encoder and Decoder and I want to add small noise (N(mean = 0, std = 0.1)) to the output of the Encoder but I donât know how to do that.
Something along those lines:
class Encoder(nn.Module)
def __init__(self, ....):
....
self.noise = GaussianNoise()
def forward(self, x):
....
output = self.noise(output)
return output
Thanks for your help but still I am confused about how to add small noise to my network.
The code below is my Generator model which I would like to add small noise to the output of encoder OR input of decoder part of my model. In the code âdownâ is Encoder part and âupâ is Decoder part
class UnetGenerator(nn.Module):
def __init__(self, input_nc, output_nc, num_downs, ngf=64, norm_layer=nn.BatchNorm2d, use_dropout=False):
super(UnetGenerator, self).__init__()
unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=None, norm_layer=norm_layer, innermost=True)
for i in range(num_downs - 5):
unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer, use_dropout=use_dropout)
unet_block = UnetSkipConnectionBlock(ngf * 4, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
unet_block = UnetSkipConnectionBlock(ngf * 2, ngf * 4, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
unet_block = UnetSkipConnectionBlock(ngf, ngf * 2, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
self.model = UnetSkipConnectionBlock(output_nc, ngf, input_nc=input_nc, submodule=unet_block, outermost=True, norm_layer=norm_layer)
def forward(self, input):
return self.model(input)
class UnetSkipConnectionBlock(nn.Module):
def __init__(self, outer_nc, inner_nc, input_nc=None,
submodule=None, outermost=False, innermost=False, norm_layer=nn.BatchNorm2d, use_dropout=False):
super(UnetSkipConnectionBlock, self).__init__()
self.outermost = outermost
if type(norm_layer) == functools.partial:
use_bias = norm_layer.func == nn.InstanceNorm2d
else:
use_bias = norm_layer == nn.InstanceNorm2d
if input_nc is None:
input_nc = outer_nc
downconv = nn.Conv2d(input_nc, inner_nc, kernel_size=4,
stride=2, padding=1, bias=use_bias)
downrelu = nn.LeakyReLU(0.2, True)
downnorm = norm_layer(inner_nc)
uprelu = nn.ReLU(True)
upnorm = norm_layer(outer_nc)
if outermost:
upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc,
kernel_size=4, stride=2,
padding=1)
down = [downconv]
up = [uprelu, upconv, nn.Tanh()]
model = down + [submodule] + up
elif innermost:
upconv = nn.ConvTranspose2d(inner_nc, outer_nc,
kernel_size=4, stride=2,
padding=1, bias=use_bias)
down = [downrelu, downconv]
up = [uprelu, upconv, upnorm]
model = down + up
else:
upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc,
kernel_size=4, stride=2,
padding=1, bias=use_bias)
down = [downrelu, downconv, downnorm]
up = [uprelu, upconv, upnorm]
if use_dropout:
model = down + [submodule] + up + [nn.Dropout(0.5)]
else:
model = down + [submodule] + up
self.model = nn.Sequential(*model)
def forward(self, x):
if self.outermost:
out = self.model(x)
return out
else: # skip connections
out = torch.cat([x, self.model(x)], 1)
return out
Thanks in advance!
Thanks for the code but somehow, the code will give me an error ânormal_cuda not implemented for Longâ. I had to add .float() to the following line
sampled_noise = self.noise.repeat(*x.size()).normal_() * scale
so that it become
sampled_noise = self.noise.repeat(*x.size()).float().normal_() * scale
to resolve the error. But Iâm not sure will it cause any difference/error.
you can try
noise = torch.randn_like(x)
x = ((noise + x).detach() - x).detach() + x
@111329 What is the purpose of doing x = ((noise + x).detach() - x).detach() + x
? Doesnât this have the same result as doing x = noise + x
, while being computationally more expensive?
noise is not differential, so we should use the STE trick to make gradients pass through as no noise.
@111329 What is the STE trick? Do you mean the reparameterization trick? If so, I think the code x = noise + x
already uses that trick.
The reparameterization trick is basically just to make sure that you donât let the random number generation depend on your learnable parameters in any way (directly or indirectly), which it doesnât do here. That is, you need to let the parameters you use to generate the random numbers be constants, and for example not generate random numbers that are sampled from a distribution with mean x
, because then you use x
as a parameter to the random number generation and x
in turn depends on your learnable parameters.
maybe you can read this paper, I did not read it, STE is a basic trick in quantization aware training.
@111329 Okay, maybe you can say that you can use the STE (I have never heard about the STE in the context of noise addition, though, but you usually talk about using the reparameterization trick).
But I still donât think the code x = ((noise + x).detach() - x).detach() + x
should behave any differently than x = noise + x
; could you explain what the difference would be?
You can try use torch.rand
directly if torch.rand
has normal gradients as you wish, else you should use some trick like STE to solve the backward problem.
My point is that there is no problem with the code x = noise + x
. It will work just as fine as the code that you provided (and it is much simpler so it should be preferred). There is no problem that youâre solving by detach
ing, subtracting x
, detach
ing and adding x
again.
Also, the straight-through estimator is used when you use a discontinuous activation function, so it doesnât apply to this case because there is no activation function that is being used. I still think that the trick you are thinking of is the reparameterization trick. (Also, the STE is usually referred to as an âestimatorââhence the âEâ in âSTEââand not as a âtrickâ.)