How does detach() work?

Gautam_Bhattacharya · April 26, 2017, 3:01pm

Hello,
In the GAN example, while training the D-network on fake data there is the line:
output = netD(fake.detach())

Q. What is the detach operation doing?
Q. This operation is not used in the Wasserstien GAN code. Why is it not needed in this model?
Q. Is the same effect being obtained by:
noisev = Variable(noise, volatile = True) # totally freeze netG

Thanks in advance,
Gautam

apaszke · April 26, 2017, 4:08pm

A simple search in the forums should be enough to answer this question. Here’s an example answer.

Elias_Vansteenkiste · September 12, 2017, 1:46pm

After reading the thread you referenced, it is not completely clear for me what the answer to the second question would be. Could you further answer question 2?
Thanks in advance,

htt210 · January 1, 2018, 5:31am

Hi Elias,
I think it’s just author’s preference. In Wasserstein implementation, they used
x_new = Variable(x_old.data, ...)
to perform detach operation. In my opinion, the new variable name makes it easier to read.

jadechip · September 21, 2019, 2:13pm

To my understanding, detach disables automatic differentiation, i.e stops keeping track of gradients.
Further information: http://www.bnikolic.co.uk/blog/pytorch-detach.html

Himani_Gulati · June 13, 2020, 8:20am

PyTorch keeps track of all operations that involve tensors and these operations are tracked/recorded as a directed graph. (Where edges have a direction associated with them).

detach()
creates a new view such that these operations are no more tracked i.e gradient is no longer being computed and subgraph is not going to be recorded.
Hence memory is not utilized.
So its helpful while working with billions of data.