Where is the noise layer in pytorch?

If I want to add some zero-centered Gaussian noise,it only active in training process. Dose pytorch has this function? Keras has it(noise layer in Keras)

1 Like

torch.randn generates noise

1 Like

Here is my code. Hope it helps.
The known issue is it slower my training process about 25%.

class GaussianNoise(nn.Module):
    def __init__(self, stddev):
        self.stddev = stddev

    def forward(self, din):
        if self.training:
            return din + torch.autograd.Variable(torch.randn(din.size()).cuda() * self.stddev)
        return din

Here is what I normally use: the major difference being that I do not pass the noise to GPU at every call which should speed things up :slight_smile:

class GaussianNoise(nn.Module):
    """Gaussian noise regularizer.

        sigma (float, optional): relative standard deviation used to generate the
            noise. Relative means that it will be multiplied by the magnitude of
            the value your are adding the noise to. This means that sigma can be
            the same regardless of the scale of the vector.
        is_relative_detach (bool, optional): whether to detach the variable before
            computing the scale of the noise. If `False` then the scale of the noise
            won't be seen as a constant but something to optimize: this will bias the
            network to generate vectors with smaller values.

    def __init__(self, sigma=0.1, is_relative_detach=True):
        self.sigma = sigma
        self.is_relative_detach = is_relative_detach
        self.noise = torch.tensor(0).to(device)

    def forward(self, x):
        if self.training and self.sigma != 0:
            scale = self.sigma * x.detach() if self.is_relative_detach else self.sigma * x
            sampled_noise = self.noise.repeat(*x.size()).normal_() * scale
            x = x + sampled_noise
        return x 
1 Like

Hey @YannDubs1,

Thanks for sharing the code, I am curious to know how can I use your code as a part of my implementation.
My model has 2 part: Encoder and Decoder and I want to add small noise (N(mean = 0, std = 0.1)) to the output of the Encoder but I don’t know how to do that.

Something along those lines:

class Encoder(nn.Module)

    def __init__(self, ....):
        self.noise = GaussianNoise()

    def forward(self, x):
        output = self.noise(output)
        return output
1 Like

Thanks for your help but still I am confused about how to add small noise to my network.

The code below is my Generator model which I would like to add small noise to the output of encoder OR input of decoder part of my model. In the code ‘down’ is Encoder part and ‘up’ is Decoder part

class UnetGenerator(nn.Module):

    def __init__(self, input_nc, output_nc, num_downs, ngf=64, norm_layer=nn.BatchNorm2d, use_dropout=False):

        super(UnetGenerator, self).__init__()
        unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=None, norm_layer=norm_layer, innermost=True)

        for i in range(num_downs - 5):
            unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer, use_dropout=use_dropout)

        unet_block = UnetSkipConnectionBlock(ngf * 4, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        unet_block = UnetSkipConnectionBlock(ngf * 2, ngf * 4, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        unet_block = UnetSkipConnectionBlock(ngf, ngf * 2, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        self.model = UnetSkipConnectionBlock(output_nc, ngf, input_nc=input_nc, submodule=unet_block, outermost=True, norm_layer=norm_layer)

    def forward(self, input):

        return self.model(input)

class UnetSkipConnectionBlock(nn.Module):

    def __init__(self, outer_nc, inner_nc, input_nc=None,
                 submodule=None, outermost=False, innermost=False, norm_layer=nn.BatchNorm2d, use_dropout=False):

        super(UnetSkipConnectionBlock, self).__init__()

        self.outermost = outermost

        if type(norm_layer) == functools.partial:
            use_bias = norm_layer.func == nn.InstanceNorm2d
            use_bias = norm_layer == nn.InstanceNorm2d

        if input_nc is None:
            input_nc = outer_nc

        downconv = nn.Conv2d(input_nc, inner_nc, kernel_size=4,
                             stride=2, padding=1, bias=use_bias)
        downrelu = nn.LeakyReLU(0.2, True)
        downnorm = norm_layer(inner_nc)
        uprelu = nn.ReLU(True)
        upnorm = norm_layer(outer_nc)

        if outermost:
            upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc,
                                        kernel_size=4, stride=2,
            down = [downconv]
            up = [uprelu, upconv, nn.Tanh()]
            model = down + [submodule] + up

        elif innermost:
            upconv = nn.ConvTranspose2d(inner_nc, outer_nc,
                                        kernel_size=4, stride=2,
                                        padding=1, bias=use_bias)

            down = [downrelu, downconv]
            up = [uprelu, upconv, upnorm]
            model = down + up
            upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc,
                                        kernel_size=4, stride=2,
                                        padding=1, bias=use_bias)

            down = [downrelu, downconv, downnorm]
            up = [uprelu, upconv, upnorm]

            if use_dropout:
                model = down + [submodule] + up + [nn.Dropout(0.5)]
                model = down + [submodule] + up

        self.model = nn.Sequential(*model)

    def forward(self, x):
        if self.outermost:
            out = self.model(x)
            return out
        else:   # skip connections
            out = torch.cat([x, self.model(x)], 1)
           return out

Thanks in advance!

Thanks for the code but somehow, the code will give me an error “normal_cuda not implemented for Long”. I had to add .float() to the following line

sampled_noise = self.noise.repeat(*x.size()).normal_() * scale

so that it become

sampled_noise = self.noise.repeat(*x.size()).float().normal_() * scale

to resolve the error. But I’m not sure will it cause any difference/error.

you can try

noise = torch.randn_like(x)
x = ((noise + x).detach() - x).detach() + x

@111329 What is the purpose of doing x = ((noise + x).detach() - x).detach() + x? Doesn’t this have the same result as doing x = noise + x, while being computationally more expensive?

noise is not differential, so we should use the STE trick to make gradients pass through as no noise.

@111329 What is the STE trick? Do you mean the reparameterization trick? If so, I think the code x = noise + x already uses that trick.

The reparameterization trick is basically just to make sure that you don’t let the random number generation depend on your learnable parameters in any way (directly or indirectly), which it doesn’t do here. That is, you need to let the parameters you use to generate the random numbers be constants, and for example not generate random numbers that are sampled from a distribution with mean x, because then you use x as a parameter to the random number generation and x in turn depends on your learnable parameters.

maybe you can read this paper, I did not read it, STE is a basic trick in quantization aware training.

@111329 Okay, maybe you can say that you can use the STE (I have never heard about the STE in the context of noise addition, though, but you usually talk about using the reparameterization trick).

But I still don’t think the code x = ((noise + x).detach() - x).detach() + x should behave any differently than x = noise + x; could you explain what the difference would be?

You can try use torch.rand directly if torch.rand has normal gradients as you wish, else you should use some trick like STE to solve the backward problem.

My point is that there is no problem with the code x = noise + x. It will work just as fine as the code that you provided (and it is much simpler so it should be preferred). There is no problem that you’re solving by detaching, subtracting x, detaching and adding x again.

Also, the straight-through estimator is used when you use a discontinuous activation function, so it doesn’t apply to this case because there is no activation function that is being used. I still think that the trick you are thinking of is the reparameterization trick. (Also, the STE is usually referred to as an “estimator”—hence the “E” in “STE”—and not as a “trick”.)