I previously had some code for a GAN:
generated_samples = G(z)
# Train Discriminator
pred_generated = D(generated_samples.detach())
pred_real = D(real_samples)
loss = BCELoss(pred_generated, 0) + BCELoss(pred_real, 1)
loss.backward()
# Train Generator
pred_generated2 = D(generated_samples)
loss = BCELoss(pred_generated2, 1)
loss.backward()
This worked fine. I then updated the discriminator to consist of 2 steps (where I want to save the intermediate output for later). This first step returns a sequence of tensors which are called features:
generated_samples = G(z)
# Train Discriminator
generated_features = get_features(generated_samples) # new
real_features = get_features(real_samples)
real_features = [f.detach() for f in real_features]
pred_generated = D([f.detach() for f in generated_features])
pred_real = D(real_features)
loss = BCELoss(pred_generated, 0) + BCELoss(pred_real, 1)
loss.backward()
# Train Generator
pred_generated2 = D(generated_features)
loss = BCELoss(pred_generated2, 1)
loss.backward()
But I get an error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [320]] is at version 5; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Which seems to relate to the line of code:
pred_generated = D([f.detach() for f in generated_features])
How should I correctly detach tensors in generated_features
as I pass them to the discriminator?
Why does it work for a single tensor and not a sequence?
Edit: f.clone().detach()
nor f.detach().clone()
seems to work.
Edit2: I tried the anomaly detection and it points to the line: generated_features = get_features(generated_samples)
as being the “root” of the error.
Edit3: The following code works, but then I have to extract features 1 time extra… not efficient…:
generated_samples = G(z)
# Train Discriminator
generated_features = get_features(generated_samples.detach()) # new
real_features = get_features(real_samples)
real_features = [f.detach() for f in real_features]
pred_generated = D([f.detach() for f in generated_features])
pred_real = D(real_features)
loss = BCELoss(pred_generated, 0) + BCELoss(pred_real, 1)
loss.backward()
# Train Generator
generated_features = get_features(generated_samples) # new
pred_generated2 = D(generated_features)
loss = BCELoss(pred_generated2, 1)
loss.backward()