If I want to add some zero-centered Gaussian noise,it only active in training process. Dose pytorch has this function? Keras has it(noise layer in Keras)
torch.randn generates noise
Here is my code. Hope it helps.
The known issue is it slower my training process about 25%.
class GaussianNoise(nn.Module): def __init__(self, stddev): super().__init__() self.stddev = stddev def forward(self, din): if self.training: return din + torch.autograd.Variable(torch.randn(din.size()).cuda() * self.stddev) return din
Here is what I normally use: the major difference being that I do not pass the noise to GPU at every call which should speed things up
class GaussianNoise(nn.Module): """Gaussian noise regularizer. Args: sigma (float, optional): relative standard deviation used to generate the noise. Relative means that it will be multiplied by the magnitude of the value your are adding the noise to. This means that sigma can be the same regardless of the scale of the vector. is_relative_detach (bool, optional): whether to detach the variable before computing the scale of the noise. If `False` then the scale of the noise won't be seen as a constant but something to optimize: this will bias the network to generate vectors with smaller values. """ def __init__(self, sigma=0.1, is_relative_detach=True): super().__init__() self.sigma = sigma self.is_relative_detach = is_relative_detach self.noise = torch.tensor(0).to(device) def forward(self, x): if self.training and self.sigma != 0: scale = self.sigma * x.detach() if self.is_relative_detach else self.sigma * x sampled_noise = self.noise.repeat(*x.size()).normal_() * scale x = x + sampled_noise return x
Thanks for sharing the code, I am curious to know how can I use your code as a part of my implementation.
My model has 2 part: Encoder and Decoder and I want to add small noise (N(mean = 0, std = 0.1)) to the output of the Encoder but I don’t know how to do that.
Something along those lines:
class Encoder(nn.Module) def __init__(self, ....): .... self.noise = GaussianNoise() def forward(self, x): .... output = self.noise(output) return output
Thanks for your help but still I am confused about how to add small noise to my network.
The code below is my Generator model which I would like to add small noise to the output of encoder OR input of decoder part of my model. In the code ‘down’ is Encoder part and ‘up’ is Decoder part
class UnetGenerator(nn.Module): def __init__(self, input_nc, output_nc, num_downs, ngf=64, norm_layer=nn.BatchNorm2d, use_dropout=False): super(UnetGenerator, self).__init__() unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=None, norm_layer=norm_layer, innermost=True) for i in range(num_downs - 5): unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer, use_dropout=use_dropout) unet_block = UnetSkipConnectionBlock(ngf * 4, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer) unet_block = UnetSkipConnectionBlock(ngf * 2, ngf * 4, input_nc=None, submodule=unet_block, norm_layer=norm_layer) unet_block = UnetSkipConnectionBlock(ngf, ngf * 2, input_nc=None, submodule=unet_block, norm_layer=norm_layer) self.model = UnetSkipConnectionBlock(output_nc, ngf, input_nc=input_nc, submodule=unet_block, outermost=True, norm_layer=norm_layer) def forward(self, input): return self.model(input) class UnetSkipConnectionBlock(nn.Module): def __init__(self, outer_nc, inner_nc, input_nc=None, submodule=None, outermost=False, innermost=False, norm_layer=nn.BatchNorm2d, use_dropout=False): super(UnetSkipConnectionBlock, self).__init__() self.outermost = outermost if type(norm_layer) == functools.partial: use_bias = norm_layer.func == nn.InstanceNorm2d else: use_bias = norm_layer == nn.InstanceNorm2d if input_nc is None: input_nc = outer_nc downconv = nn.Conv2d(input_nc, inner_nc, kernel_size=4, stride=2, padding=1, bias=use_bias) downrelu = nn.LeakyReLU(0.2, True) downnorm = norm_layer(inner_nc) uprelu = nn.ReLU(True) upnorm = norm_layer(outer_nc) if outermost: upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc, kernel_size=4, stride=2, padding=1) down = [downconv] up = [uprelu, upconv, nn.Tanh()] model = down + [submodule] + up elif innermost: upconv = nn.ConvTranspose2d(inner_nc, outer_nc, kernel_size=4, stride=2, padding=1, bias=use_bias) down = [downrelu, downconv] up = [uprelu, upconv, upnorm] model = down + up else: upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc, kernel_size=4, stride=2, padding=1, bias=use_bias) down = [downrelu, downconv, downnorm] up = [uprelu, upconv, upnorm] if use_dropout: model = down + [submodule] + up + [nn.Dropout(0.5)] else: model = down + [submodule] + up self.model = nn.Sequential(*model) def forward(self, x): if self.outermost: out = self.model(x) return out else: # skip connections out = torch.cat([x, self.model(x)], 1) return out
Thanks in advance!
Thanks for the code but somehow, the code will give me an error “normal_cuda not implemented for Long”. I had to add .float() to the following line
sampled_noise = self.noise.repeat(*x.size()).normal_() * scale
so that it become
sampled_noise = self.noise.repeat(*x.size()).float().normal_() * scale
to resolve the error. But I’m not sure will it cause any difference/error.
you can try
noise = torch.randn_like(x) x = ((noise + x).detach() - x).detach() + x
@111329 What is the purpose of doing
x = ((noise + x).detach() - x).detach() + x? Doesn’t this have the same result as doing
x = noise + x, while being computationally more expensive?
noise is not differential, so we should use the STE trick to make gradients pass through as no noise.
@111329 What is the STE trick? Do you mean the reparameterization trick? If so, I think the code
x = noise + x already uses that trick.
The reparameterization trick is basically just to make sure that you don’t let the random number generation depend on your learnable parameters in any way (directly or indirectly), which it doesn’t do here. That is, you need to let the parameters you use to generate the random numbers be constants, and for example not generate random numbers that are sampled from a distribution with mean
x, because then you use
x as a parameter to the random number generation and
x in turn depends on your learnable parameters.
maybe you can read this paper, I did not read it, STE is a basic trick in quantization aware training.
@111329 Okay, maybe you can say that you can use the STE (I have never heard about the STE in the context of noise addition, though, but you usually talk about using the reparameterization trick).
But I still don’t think the code
x = ((noise + x).detach() - x).detach() + x should behave any differently than
x = noise + x; could you explain what the difference would be?
You can try use
torch.rand directly if
torch.rand has normal gradients as you wish, else you should use some trick like STE to solve the backward problem.
My point is that there is no problem with the code
x = noise + x. It will work just as fine as the code that you provided (and it is much simpler so it should be preferred). There is no problem that you’re solving by
detaching and adding
Also, the straight-through estimator is used when you use a discontinuous activation function, so it doesn’t apply to this case because there is no activation function that is being used. I still think that the trick you are thinking of is the reparameterization trick. (Also, the STE is usually referred to as an “estimator”—hence the “E” in “STE”—and not as a “trick”.)