I want to optimise the graident like the paper “Improved Training of Wasserstein GANs”. Something like, D(input), D.backward(), loss = D_loss + || input.grad - 1||. I found that input.grad don’t have creator, so the graph of grad don’t connect to the net, it won’t implement somethine like input.grad.backward(). How could I do this? By the way, the author of that paper using tensorflow.