Detaching a sequence of tensors gives inplace modification error

I previously had some code for a GAN:

generated_samples = G(z)

# Train Discriminator
pred_generated = D(generated_samples.detach())
pred_real = D(real_samples)

loss = BCELoss(pred_generated, 0) + BCELoss(pred_real, 1)
loss.backward()

# Train Generator
pred_generated2 = D(generated_samples)

loss = BCELoss(pred_generated2, 1)
loss.backward()

This worked fine. I then updated the discriminator to consist of 2 steps (where I want to save the intermediate output for later). This first step returns a sequence of tensors which are called features:

generated_samples = G(z)

# Train Discriminator
generated_features = get_features(generated_samples) # new
real_features = get_features(real_samples)
real_features = [f.detach() for f in real_features]

pred_generated = D([f.detach() for f in generated_features])
pred_real = D(real_features)

loss = BCELoss(pred_generated, 0) + BCELoss(pred_real, 1)
loss.backward()

# Train Generator
pred_generated2 = D(generated_features)

loss = BCELoss(pred_generated2, 1)
loss.backward()

But I get an error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [320]] is at version 5; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Which seems to relate to the line of code:
pred_generated = D([f.detach() for f in generated_features])

How should I correctly detach tensors in generated_features as I pass them to the discriminator?

Why does it work for a single tensor and not a sequence?

Edit: f.clone().detach() nor f.detach().clone() seems to work.

Edit2: I tried the anomaly detection and it points to the line: generated_features = get_features(generated_samples) as being the “root” of the error.

Edit3: The following code works, but then I have to extract features 1 time extra… not efficient…:

generated_samples = G(z)

# Train Discriminator
generated_features = get_features(generated_samples.detach()) # new
real_features = get_features(real_samples)
real_features = [f.detach() for f in real_features]

pred_generated = D([f.detach() for f in generated_features])
pred_real = D(real_features)

loss = BCELoss(pred_generated, 0) + BCELoss(pred_real, 1)
loss.backward()

# Train Generator
generated_features = get_features(generated_samples) # new
pred_generated2 = D(generated_features)

loss = BCELoss(pred_generated2, 1)
loss.backward()

I’m not sure what exactly might be causing the issue as it seems that D gets detached tensors and thus “Train Discriminator” shouldn’t update G.
Could you check, if any gradients are created in the “Train Discriminator” step or if the parameters or G were changed? I don’t know how the optmizer(s) were created but in case you are using an optimizer with running stats and are updating G already in the “Train Discriminator” step (even with zero gradients in G) you might be hitting this issue.