Hello,
In the GAN example, while training the D-network on fake data there is the line:
output = netD(fake.detach())
Q. What is the detach operation doing?
Q. This operation is not used in the Wasserstien GAN code. Why is it not needed in this model?
Q. Is the same effect being obtained by:
noisev = Variable(noise, volatile = True) # totally freeze netG
After reading the thread you referenced, it is not completely clear for me what the answer to the second question would be. Could you further answer question 2?
Thanks in advance,
Hi Elias,
I think it’s just author’s preference. In Wasserstein implementation, they used x_new = Variable(x_old.data, ...)
to perform detach operation. In my opinion, the new variable name makes it easier to read.
PyTorch keeps track of all operations that involve tensors and these operations are tracked/recorded as a directed graph. (Where edges have a direction associated with them).
detach()
creates a new view such that these operations are no more tracked i.e gradient is no longer being computed and subgraph is not going to be recorded.
Hence memory is not utilized.
So its helpful while working with billions of data.