Inception v3 RuntimeError with torch 1.0.0

anianruoss · December 31, 2018, 4:40pm

I am running exactly the same code with torch==0.4.1 and torch==1.0.0.

torch==0.4.1: my code works with ResNet101 and Inception v3
torch==1.0.0: my code works with ResNet101 but Inception v3 fails with this stack trace:

Traceback (most recent call last):
  File "/home/anianruoss/experiments/imagenet/run_stadv_pytorch.py", line 100, in <module>
    flows_x0, optimizers[args.optimizer]
  File "/home/anianruoss/stAdv_pytorch/optimization.py", line 189, in pytorch_wrapper
    optim.step(closure)
  File "/home/anianruoss/venv/lib/python3.6/site-packages/torch/optim/adam.py", line 58, in step
    loss = closure()
  File "/home/anianruoss/stAdv_pytorch/optimization.py", line 183, in closure
    loss.backward()
  File "/home/anianruoss/venv/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/anianruoss/venv/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

This is roughly what my code does:

optimizer = torch.optim.Adam([flows])

def closure():
    optimizer.zero_grad()
    loss = CustomLoss(flows).sum()
    loss.backward()
    return loss

for i in range(steps):
    optimizer.step(closure)

Do you know what could be the reason for this?

vexilligera · February 14, 2019, 3:51am

Did you solve it? I had the same issue I have no idea.

Tony-Y · February 14, 2019, 5:25am

Please try to use PyTorch 1.0.1.

anianruoss · February 14, 2019, 1:41pm

Thank you for your help @Tony-Y. Unfortunately the problem remains even with torch==1.0.1.post2 and torchvision==0.2.1.

Tony-Y · February 15, 2019, 5:09am

Maybe, you have to modify CustomLoss slightly. An issue due to a custom loss has been reported:

He resolved this problem by cloning tensors in the custom loss.

anianruoss · February 18, 2019, 9:27am

That seems to solve the problem. Thanks! However, I am still confused as to why this occurs with Inception v3 and not with ResNet-101.

Tony-Y · February 18, 2019, 3:45pm

http://www.yongfengli.tk/2018/04/13/inplace-operation-in-pytorch.html

This article might help you to understand why it needs cloning.