# Perturbing the output of a loss function

Hi all,

As part of my team’s research, we are investigating applying a perturbation to the loss function of a neural network, to backpropagate using a noisy loss rather than the true loss. We are experimenting with normal noise with mean 0, and changing the standard deviation. We have implemented the following code:

loss_noisy = loss + np.random.normal(0, scale) * loss / loss.detach()

loss_noisy.backward()

This adds the normalised loss vector to itself, with a random scaling, to emulate the equation

L = L + N(0, Scale)

We have produced results which show degraded performance of the network with increased scaling. However, we are unsure exactly how this is working under the hood. In particular, if a Loss value is composed of multiple samples, it is unclear how the perturbation is ‘distributed’ amongst the individual sample backpropagation steps.

I was hoping that someone might have some experience with this, and would understand how the perturbation is distributed during backpropagation.

Regards,

Elliott

1 Like

Hi Elliott!

Let’s first cut to the chase and understand how your perturbation
affects the gradients computed by `.backward()`.

You have:

``````loss_noisy = (1 + fac) * loss
``````

where `fac` is a random deviate divided by `loss.detach()`.

`loss.detach()` is numerically equal to `loss`, but is not, itself,
differentiated in the backpropagation process.

The whole gradient-computation process is linear in the final `loss`
scalar, so if `unperturbed_grad` is the result you would have obtained
by backpropagating `loss`, backpropagating `loss_noisy` will give you:

``````some_parameter.grad = (1 + fac) * unperturbed_grad
``````

As for how the perturbation is distributed among individual samples,
your `loss` will typically be a sum (or average) over samples:

``````loss = loss_batch = loss_samples.sum()
``````

(where `loss_samples` is a vector of length `nBatch` of per-sample
losses).

If you were to compute:

``````loss_samples_noisy = (1 + fac) * loss_samples
``````

(where `fac` does include the `loss.detach()` that has been summed
over the samples), then

``````loss_noisy = loss_samples_noisy.sum()
``````

So your perturbation gets distributed over the per-sample losses simply
by multiplying them individually by the same linear factor you used to
produce `loss_noisy`.

As an aside, I would probably perturb (add noise to) the predictions
of your model or to your target values, rather than perturbing `loss`
directly.

Best.

K. Frank

3 Likes

Hi KFrank!

Thanks for the detailed response, this has given us a lot to chew on. The observation that

``````some_parameter.grad = (1 + fac) * unperturbed_grad
``````

was incredibly helpful for us to investigate our problem.

Regards,
Elliott