Custom loss function with a gradient operation

I am attempting to implement adversarial training for the Fast Gradient Sign Method, which is a widely-used adversarial attack (see section 6 of [1412.6572] Explaining and Harnessing Adversarial Examples). This involves creating a custom loss function which includes a gradient operation. What is the proper way to do this? I don’t believe it is sufficient to use .backward inside the loss computation, because this would set the .grad attribute on all my tensors as a side effect.

From reading the documentation, it seems that the function I need is torch.autograd.grad, but the docs don’t clearly specify what side effects this function has. Will it modify the .grad attributes of my tensors?

Calling torch.autograd.grad will not accumulate gradients in your tensors, but will free the computation graph.
Can your problem be solved by using a strategically-placed .detach somewhere in your loss computation?

Thanks for your reply! I did consider using .detach but it doesn’t seem to fit my use case, as I need to backpropagate through the gradient operation. I ended up using torch.autograd.grad with retain_graph=True and create_graph=True and it appears to be doing the right thing.

I do have one follow-up question: in torch.autograd.grad, what is the grad_outputs argument used for? The description in the docs is unclear to me. I’m leaving it as the default value and my code appears to be working fine.