Hi all,
As part of my team’s research, we are investigating applying a perturbation to the loss function of a neural network, to backpropagate using a noisy loss rather than the true loss. We are experimenting with normal noise with mean 0, and changing the standard deviation. We have implemented the following code:
loss_noisy = loss + np.random.normal(0, scale) * loss / loss.detach()
loss_noisy.backward()
This adds the normalised loss vector to itself, with a random scaling, to emulate the equation
L = L + N(0, Scale)
We have produced results which show degraded performance of the network with increased scaling. However, we are unsure exactly how this is working under the hood. In particular, if a Loss value is composed of multiple samples, it is unclear how the perturbation is ‘distributed’ amongst the individual sample backpropagation steps.
I was hoping that someone might have some experience with this, and would understand how the perturbation is distributed during backpropagation.
Regards,
Elliott