This sounds maybe a trivial question but I still don’t understand how the gradients are calculated for the applying gradient penalty.

gradients = torch_grad(outputs=prob_interpolated, inputs=interpolated,

grad_outputs=torch.ones(prob_interpolated.size()).cuda() if self.use_cuda else torch.ones( prob_interpolated.size()), create_graph=True, retain_graph=True)[0]

Why we need the value at index “[0]”?