Example DCGAN: when updateing netD, noise can be volatile

chenyuntc · March 5, 2017, 1:11pm

In the example of DCGAN training D, Line214-216 which is shown below:

noise.data.resize_(batch_size, nz, 1, 1)
noise.data.normal_(0, 1)
fake = netG(noise)

I think it would be better if we change it to:

noise = Variable(torch.Tensor(batch_size, nz, 1, 1).normal_(0, 1),volatile = True)
fake = Variable(netG(noise).data)

because when training netD, we won’t need the buffer and gradient of netG. this may acclerate training step and reduce memory usage.[details=Summary]This text will be hidden[/details]

smth · March 5, 2017, 3:47pm

You are correct, changing it like this is better. Even better, you can do:

fake = netG(noise)
fake.detach()

If you are interested, please send a pull request to fix it, I will merge.

Veril · March 5, 2017, 4:06pm

detach according to the docs:

Returns a new Variable, detached from the current graph.
Result will never require gradient. If the input is volatile, the output will be volatile too.

What does it mean for the variable to be detached and how does it affect performance?

I get the volatile backprop error with detach unless I do fake = Variable(fake.data)
If the idea is to not use volatile then what does it mean for the generator? Wouldn’t it be less efficient?

smth · March 5, 2017, 7:27pm

detach will basically just forget about it’s creator attribute, and hence wont backprop through any paths that created it.

Veril · March 5, 2017, 7:35pm

Right, but it will also copy the volatile attribute of its predecessor, which means that you must either recast the variable anyway or not use volatile for the generator.

So does detach serve a purpose here? I assume volatile in the generator is good, or why else have that flag option at all?

apaszke · March 5, 2017, 10:28pm

Because you don’t need to use volatile if you’re going to use detach later. It will be nearly equivalent, with volatile possibly being a bit faster and more memory efficient.

chenyuntc · March 6, 2017, 2:48am

actually I tried detach at first, but it raise RuntimeError :

-------------------------------------------------------------------
RuntimeError                      Traceback (most recent call last)
<ipython-input-25-5b95d5373ece> in <module>()
     19         fake_pic = netg(noise_).detach()
     20         output2 = netd(fake_pic)
---> 21         output2.backward(mone) #change for wgan
     22         D_x2 = output2.data.mean()
     23         optimizerD.step()

/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.pyc in backward(self, gradient, retain_variables)
    149         """
    150         if self.volatile:
--> 151             raise RuntimeError('calling backward on a volatile variable')
    152         if gradient is None and self.requires_grad:
    153             if self.data.numel() != 1:

RuntimeError: calling backward on a volatile variable

and I use ipdb to debug, it shows that both fake_pic and the output is volatile. it seems that even with detach, volatile still spread to the whole net.

> /usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py(151)backward()
    149         """
    150         if self.volatile:
--> 151             raise RuntimeError('calling backward on a volatile variable')
    152         if gradient is None and self.requires_grad:
    153             if self.data.numel() != 1:

ipdb> u
> <ipython-input-27-5b95d5373ece>(21)<module>()
     19         fake_pic = netg(noise_).detach()
     20         output2 = netd(fake_pic)
---> 21         output2.backward(mone) #for wgan
     22         D_x2 = output2.data.mean()
     23         optimizerD.step()

ipdb> output2.volatile
True
ipdb> fake_pic.volatile
True

just as the docs of detach goes:

If the input is volatile, the output will be volatile too.

chenyuntc · March 6, 2017, 3:04am

I was working on the commit, but when I pull the latest code, I found this commit is a better solution. So I possibly won’t send a PR.

apaszke · March 6, 2017, 8:16am

Yes, volatile will be propagated even if you use detach(). I meant that you could remove the volatile flag and use detach with Variables that don’t require grad.