I want to optimise the graident like the paper “Improved Training of Wasserstein GANs”. Something like, D(input), D.backward(), loss = D_loss + || input.grad - 1||. I found that input.grad don’t have creator, so the graph of grad don’t connect to the net, it won’t implement somethine like input.grad.backward(). How could I do this? By the way, the author of that paper using tensorflow.
It’s already implemented, but the PR waits for review. It’s probably going to be merged next week.
Thank you, I’ll see~