because when training netD, we won’t need the buffer and gradient of netG. this may acclerate training step and reduce memory usage.[details=Summary]This text will be hidden[/details]
Returns a new Variable, detached from the current graph.
Result will never require gradient. If the input is volatile, the output will be volatile too.
What does it mean for the variable to be detached and how does it affect performance?
I get the volatile backprop error with detach unless I do fake = Variable(fake.data)
If the idea is to not use volatile then what does it mean for the generator? Wouldn’t it be less efficient?
Right, but it will also copy the volatile attribute of its predecessor, which means that you must either recast the variable anyway or not use volatile for the generator.
So does detach serve a purpose here? I assume volatile in the generator is good, or why else have that flag option at all?
Because you don’t need to use volatile if you’re going to use detach later. It will be nearly equivalent, with volatile possibly being a bit faster and more memory efficient.
actually I tried detach at first, but it raise RuntimeError :
-------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-25-5b95d5373ece> in <module>()
19 fake_pic = netg(noise_).detach()
20 output2 = netd(fake_pic)
---> 21 output2.backward(mone) #change for wgan
22 D_x2 = output2.data.mean()
23 optimizerD.step()
/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.pyc in backward(self, gradient, retain_variables)
149 """
150 if self.volatile:
--> 151 raise RuntimeError('calling backward on a volatile variable')
152 if gradient is None and self.requires_grad:
153 if self.data.numel() != 1:
RuntimeError: calling backward on a volatile variable
and I use ipdb to debug, it shows that both fake_pic and the output is volatile. it seems that even with detach, volatile still spread to the whole net.
> /usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py(151)backward()
149 """
150 if self.volatile:
--> 151 raise RuntimeError('calling backward on a volatile variable')
152 if gradient is None and self.requires_grad:
153 if self.data.numel() != 1:
ipdb> u
> <ipython-input-27-5b95d5373ece>(21)<module>()
19 fake_pic = netg(noise_).detach()
20 output2 = netd(fake_pic)
---> 21 output2.backward(mone) #for wgan
22 D_x2 = output2.data.mean()
23 optimizerD.step()
ipdb> output2.volatile
True
ipdb> fake_pic.volatile
True
Yes, volatile will be propagated even if you use detach(). I meant that you could remove the volatile flag and use detach with Variables that don’t require grad.