Different behavior of auto grad on PyTorch1.0 and >=1.1

Hi, I’m trying to port my code from version 1.0 to newer version of PyTorch.
The following code snippet works fine on PyTorch1.0 but encounters RuntimeError on PyTorch1.3.
Do you have any idea about this?

code:

criterion_kl = nn.KLDivLoss(reduction='sum')
model.eval()

# generate adversarial example
x_adv = x_natural.detach() + 0.001 * torch.randn_like(x_natural)
for _ in range(self.perturb_steps):
    x_adv.requires_grad_()
    with torch.enable_grad():
        loss_kl = criterion_kl(F.log_softmax(model(x_adv), dim=1),
                            F.softmax(model(x_natural), dim=1))
    grad = torch.autograd.grad(loss_kl, [x_adv])[0].detach() 
    x_adv = x_adv.detach() + self.step_size * torch.sign(grad)
    x_adv = torch.min(torch.max(x_adv, x_natural - self.epsilon), x_natural + self.epsilon)
    x_adv = torch.clamp(x_adv, 0.0, 1.0)

error message:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [640]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Traceback (most recent call last):
  File "main_adv.py", line 326, in <module>
    main()
  File "main_adv.py", line 187, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "main_adv.py", line 240, in train
    output, loss = criterion(input, target, index, epoch, model, optimizer)
  File "/home/code/losses/trades.py", line 51, in __call__
    grad = torch.autograd.grad(loss_kl, [delta])[0].detach()
  File "/home/code/anaconda3/envs/torch1.3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)

Do you see the offending operation in the stack trace?
Based on the code snippet I cannot find the inplace operation, which might cause this error. :confused:

No. That’s all I see in the stack trace.
BTW I was using distributed dataparallel and then encountered the error. However, it seems everything well when I run the code on single GPU device even using PyTorch1.1
Maybe this is related to the distributed scheme?