Special sum of loss functions for training

lucasegidio1 · August 24, 2024, 6:58pm

When you perturb your parameters with param.add_(perturbation), you are doing an in-place modification of the leaf tensor for which you want to calculate the gradient. The backward call need these original saved tensors to compute the gradient and modifying them in-place makes the saved values “invalid” for the backpropagation as said here

As an alternative you can either create copies of your model (for each perturbation) or accept the perturbation vector as an input of the forward call and use it within there instead, without having to modify the parameters. Maybe defining a suitable parametrization is also a cleaner solution, but not really necessary…