In sample of dcgan, it done as

```
optimize_D.zeros()
loss_real=compute_loss_real()
loss_real.backward()
loss_fake=compute_loss_fake()
loss_fake.backward()
optimize_D.step()
```

I also find another implementation likes

```
optimize_D.zeros()
loss_real=compute_loss_real()
loss_fake=compute_loss_fake()
loss_total=(loss_real+loss_fake) /2.0
loss_total.backward()
optimize_D.step()
```

Which is correct way? Do we have any benefit using the second way? My code give a small gain using the second way

crcrpar
(Masaki Kozuki)
June 12, 2018, 6:37am
#3
hi

Until `optimizer.zero_grad()`

is called, gradients are accumulated.
So, in general, 2 imple.s would work the same way, but the scale of gradients would be different.

You means they may provide different solution, because you said “the scale of gradients would be different.”?

crcrpar
(Masaki Kozuki)
June 12, 2018, 12:01pm
#5
Yes.

In 2nd way, each parameter’s each update step will be smaller and this might help the training stabilizes, I’m not sure.