or epoch in range(num_epochs):
for batch_idx, (real, _) in enumerate(loader):
real = real.view(-1, 784).to(DEVICE)
batch_size = real.shape[0]
# Discriminator
noise = torch.randn(batch_size, z_dim).to(DEVICE)
fake = gen(noise)
disc_real = disc(real).view(-1)
lossD_real = criterion(disc_real, torch.ones_like(disc_real))
disc_fake = disc(fake).view(-1)
lossD_fake = criterion(disc_fake, torch.zeros_like(disc_fake))
lossD = (lossD_real + lossD_fake) / 2
disc.zero_grad()
lossD.backward(retain_graph=True)
opt_disc.step()
# Generator
output = disc(fake).view(-1)
lossG = criterion(output, torch.ones_like(output))
gen.zero_grad()
lossG.backward()
opt_gen.step()
In the Generator part you can see that I use:
# Generator
output = disc(fake).view(-1)
# Generator
# output = disc(fake).view(-1)
lossG = criterion(disc_fake, torch.ones_like(output))
Which was already computed in the Discriminator part above.
Why does the following not work?:
RuntimeError:
one of the variables needed for gradient computation has been modified by
an in-place operation: [torch.cuda.FloatTensor [128, 1]], which is output 0 of
AsStridedBackward0, is at version 18754; expected version 18753 instead. Hint: enable
anomaly detection to find the operation that failed to compute its gradient, with
torch.autograd.set_detect_anomaly(True).
I don’t understand why this redundant computation is done and why the result can’t be copied.
I tried to do the optimization step after all gradients were calculated which resolved the problem:
for epoch in range(num_epochs):
for batch_idx, (real, _) in enumerate(loader):
real = real.view(-1, 784).to(DEVICE)
batch_size = real.shape[0]
# Discriminator:
noise = torch.randn(batch_size, z_dim).to(DEVICE)
fake = gen(noise)
disc_real = disc(real).view(-1)
lossD_real = criterion(disc_real, torch.ones_like(disc_real))
disc_fake = disc(fake).view(-1)
lossD_fake = criterion(disc_fake, torch.zeros_like(disc_fake))
lossD = (lossD_real + lossD_fake) / 2
disc.zero_grad()
lossD.backward(retain_graph=True)
# Generator
#output = disc(fake).view(-1)
# now i can use disc_fake from above
lossG = criterion(disc_fake, torch.ones_like(output))
gen.zero_grad()
lossG.backward()
opt_disc.step()
opt_gen.step()
Maybe there is a misunderstanding how gradients work in pytorch on my side. Could someone please share some insights ?