Hi guys i hope someone can help me on this one,
So i am trying to implement an algorithm for GANs training, i have defined my generator and discriminator lets say:
G = mygenerator()
D = mydiscriminator()
loss = myloss()
Now going through one step of what my training routine should be, once i have computed the loss (which is function of both G and D) i need to compute the gradient of this w.r.t the parameters of both networks. i will refer with X to the parameters of G and with Y to the ones of D. I do this as:
grad_x = autograd.grad(loss, G.parameters(), create_graph=True, retain_graph=True, allow_unused= True) grad_x_vec = torch.cat([g.contiguous().view(-1) for g in grad_x])
Similar code for grad_y, now in order to update the weights(i’ll refer only to the ones of X here) i need to compute the product Dxy * grad_y, where with Dxy i mean the derivative of the loss first w.r.t X and then w.r.t Y.
I think it’s possible to compute this product in one step by the correct use of the grad_outputs parameter in autograd.grad.
Here’s what is confusing me:
if i call:
autograd.grad(grad_y_vec, G.parameters(), grad_outputs = grad_y_vec)
In my mind i would obtain Dyx * grad_y_vec, but by simulating this same problem in a small dimensional environment where i can actually see whats inside the tensors i’ve realized that the output of the code above is Dxy*grad_y (which is what i actually need), but doesn’t make much sense to me since i feel like i am differentiating first w.r.t Y and then wrt X, does someone understands what i am getting wrong?
Thanks for your time