How does detach() work?

Hello,
In the GAN example, while training the D-network on fake data there is the line:
output = netD(fake.detach())

Q. What is the detach operation doing?
Q. This operation is not used in the Wasserstien GAN code. Why is it not needed in this model?
Q. Is the same effect being obtained by:
noisev = Variable(noise, volatile = True) # totally freeze netG

Thanks in advance,
Gautam

8 Likes

A simple search in the forums should be enough to answer this question. Here’s an example answer.

3 Likes

After reading the thread you referenced, it is not completely clear for me what the answer to the second question would be. Could you further answer question 2?
Thanks in advance,

Hi Elias,
I think it’s just author’s preference. In Wasserstein implementation, they used
x_new = Variable(x_old.data, ...)
to perform detach operation. In my opinion, the new variable name makes it easier to read.

3 Likes

To my understanding, detach disables automatic differentiation, i.e stops keeping track of gradients.
Further information: http://www.bnikolic.co.uk/blog/pytorch-detach.html

8 Likes

PyTorch keeps track of all operations that involve tensors and these operations are tracked/recorded as a directed graph. (Where edges have a direction associated with them).

detach()
creates a new view such that these operations are no more tracked i.e gradient is no longer being computed and subgraph is not going to be recorded.
Hence memory is not utilized.
So its helpful while working with billions of data.

1 Like