Backpropagating through noise

I want to add random gaussian noise to my network weights, for every forward pass. When backpropagating, I want to calculate gradients in respect to distorted weights, then update the original weights using those gradients. Am I doing it right in the example below?

``````class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear = nn.Linear(784, 10)
self.gaussian = Normal(loc=0, scale=torch.ones_like(self.linear.weight))

def forward(self, x):
orig_weight = self.linear.weight.clone()
noise = self.gaussian.sample()
self.linear.weight.data = self.linear.weight.data + noise
x = self.linear(x)
self.linear.weight.data = orig_weight.data
return x

model = Net()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
model.train()

for epoch in range(10):
for i in range(100):
output = model(input)
loss = nn.CrossEntropyLoss()(output, label)
loss.backward()
optimizer.step()
``````
2 Likes

Iâ€™ve debugged your code and it seems to do exactly what you wish to achieve.
I have to say I donâ€™t really like the usage of `.data` in general, but this might be a valid use case.
At least Iâ€™m not sure how to make it better without manipulating the linear implementation.
Maybe someone else will have a good idea.

I appreciate your help! Initially I was surprised by your answer, because shortly after I posted my question, I realized that the (more) correct way to do it is like this:

``````class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear = nn.Linear(784, 10)

def forward(self, x):
return self.linear(x)

model = Net()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
model.train()

for epoch in range(10):
for i in range(100):

orig_params = []
for p in model.parameters():
orig_params.append(p.clone())
gaussian = Normal(loc=0, scale=torch.ones_like(p))
p.data = p.data + gaussian.sample()

output = model(input)
loss = nn.CrossEntropyLoss()(output, label)

loss.backward()

for p, orig_p in zip(model.parameters(), orig_params):
p.data = orig_p.data

optimizer.step()
``````

However, now I see that the reason you didnâ€™t see any issues with my initial example is that I simplified it too much. Because thereâ€™s no hidden layer to backpropagate through, it does not make any difference. However, if we consider, for example, MLP with a hidden layer:

``````class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear1 = nn.Linear(784, 100)
self.linear2 = nn.Linear(100, 10)
self.relu = nn.ReLU()

def forward(self, x):
x = self.linear1(x)
x = self.relu(x)
x = self.linear2(x)
return x
``````

Well, now we would have a problem with the method in the first example: if we apply noise to both weight matrices (in linear1 and linear2), then the error would backpropagate through the original linear2 weights, while what we really want is to backpropagate through the distorted linear2 weights, to get the true gradient in respect to distorted weights of the linear1 layer. Do you agree?

By the way, Iâ€™m curious, how did you debug/verify the code?

1 Like

You are right, I assumed you are adding the noise and resetting the weights of all layers before and after the update step respectively.

Well, I just initialized the weights to a constant value, calculated the expected gradients and had a look what happens after adding/resetting the weights and the update step.
It was just a numerical check to see, if the â€śrightâ€ť gradients are used as Iâ€™m always worried about manipulating `.data`.

1 Like

@michaelklachko Was there some paper which motivated you to do this? Can you share it?

I know this under the term â€śvariational weight noiseâ€ť or â€śvariational parameter noiseâ€ť but Iâ€™m not sure where that term comes from.

Regarding the implementation, I think this can be implemented in a similar way than `WeightNorm`, right?

At the time I was working on this paper: [1904.01705] Improving Noise Tolerance of Mixed-Signal Neural Networks

I actually ended up creating custom layers to do this: NoisyNet/hardware_model.py at master Â· michaelklachko/NoisyNet Â· GitHub

Iâ€™m not sure what is â€śvariational weight noiseâ€ť, this paper might give you some ideas: [1506.02557] Variational Dropout and the Local Reparameterization Trick