Hi all, I am trying to convert a work coded by TensorFlow into PyTorch and it relates to GAN. In this code, I want to update the discriminator twice and generator multiple times. But there always an error that I cannot fix, which is " RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation".
I know that this is because we cannot compute gradients and update multiple times because we will change the weights, discriminator, or generator inplace, and cause the later gradient computation an error. However, this works can be done in TensorFlow, because they simply take out the gradients and do the update multiple times.
I really don’t know how to handle this problem, and I’ve been stuck in this for days. Could someone please help me with it. Many thanks. Below is my codes.
# ================================================================== #
# Get the real, noise #
# ================================================================== #
x_real, targets = images.cuda(), targets.cuda()
x_real = 2 * x_real - 1
x_noise = maximizer(x_real)
x_perturbed = x_real + epsilon[dataset] * x_noise
d_out_real = minimizer(x_real)
d_out_noise = minimizer(x_perturbed)
d_loss_real = criterion(d_out_real, targets)
d_loss_noise = criterion(d_out_noise, targets)
d_loss = d_loss_real + d_loss_noise
g_loss = -1.0 * d_loss_noise
# update d_loss
reset_grad()
d_loss.backward(retain_graph=True)
min_optimizer1.step()
# ================================================================== #
# Train the generator #
# ================================================================== #
# 1st step
reset_grad()
g_loss.backward(retain_graph=True)
max_optimizer.step()
# 2nd step
x_noise2 = maximizer(x_real)
x_perturbed2 = x_real + epsilon[dataset] * x_noise2
d_out_noise2 = minimizer(x_perturbed2)
d_loss_noise2 = criterion(d_out_noise2, targets)
g_loss2 = -1.0 * d_loss_noise2
reset_grad()
g_loss2.backward(retain_graph=True)
max_optimizer.step()
# 3rd step
x_noise3 = maximizer(x_real)
x_perturbed3 = x_real + epsilon[dataset] * x_noise3
d_out_noise3 = minimizer(x_perturbed3)
d_loss_noise3 = criterion(d_out_noise3, targets)
g_loss3 = -1.0 * d_loss_noise3
reset_grad()
g_loss3.backward(retain_graph=True)
max_optimizer.step()
# 4th step
x_noise4 = maximizer(x_real)
x_perturbed4 = x_real + epsilon[dataset] * x_noise4
d_out_noise4 = minimizer(x_perturbed4)
d_loss_noise4 = criterion(d_out_noise4, targets)
g_loss4 = -1.0 * d_loss_noise4
reset_grad()
g_loss4.backward(retain_graph=True)
max_optimizer.step()
# 5th step
x_noise5 = maximizer(x_real)
x_perturbed5 = x_real + epsilon[dataset] * x_noise5
d_out_noise5 = minimizer(x_perturbed5)
d_loss_noise5 = criterion(d_out_noise5, targets)
g_loss5 = -1.0 * d_loss_noise5
reset_grad()
g_loss5.backward(retain_graph=True)
max_optimizer.step()
# 6th step: virtual step
x_noise6 = maximizer(x_real)
x_perturbed6 = x_real + epsilon[dataset] * x_noise6
d_out_noise6 = minimizer(x_perturbed6)
g_loss6 = -1.0 * criterion(d_out_noise6, targets)
d_loss_noise6 = criterion(d_out_noise6, targets)
reset_grad()
g_loss6.backward(retain_graph=True)
max_virtual_optimizer.step()
# ================================================================== #
# Train the discriminator #
# ================================================================== #
# compute noiseB
x_noiseB = maximizer(x_real)
x_perturbedB = x_real + epsilon[dataset] * x_noiseB
d_out_noiseB = minimizer(x_perturbedB)
d_loss_noiseB = criterion(d_out_noiseB, targets)
# combine: loss_real + loss_noise6 + loss_noiseB
d_loss_full = d_loss_real + g_loss6 + gamma * (g_loss6 - d_loss_noiseB) / g_step_size
reset_grad()
d_loss_full.backward(retain_graph=True)
min_optimizer1.step()
# restore para for G
minus_g_loss = -1.0 * g_loss6
reset_grad()
minus_g_loss.backward(retain_graph=True)
max_virtual_optimizer.step()