Hook function does not work well during backpropagation

promach · December 8, 2021, 4:47pm

I have the following error with AvgPool2D

How to get around it ?

WARNING: Logging before InitGoogleLogging() is written to STDERR
W20211209 00:31:29.223598  8950 python_anomaly_mode.cpp:102] Warning: Error detected in AvgPool2DBackward0. Traceback of forward call that caused the error:
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 755, in <module>
    ltrain = train_NN(forward_pass_only=0)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 400, in train_NN
    y2 = graph.cells[c].nodes[n].connections[cc].edges[e].forward_f(x2)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 106, in forward_f
    return self.f(x)
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/pooling.py", line 616, in forward
    return F.avg_pool2d(input, self.kernel_size, self.stride,
 (function _print_stack)
Traceback (most recent call last):
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 755, in <module>
    ltrain = train_NN(forward_pass_only=0)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 525, in train_NN
    Ltrain.backward()
  File "/usr/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Output 0 of BackwardHookFunctionBackward is a view and its base or another view of its base has been modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.

Process finished with exit code 1

ptrblck · December 9, 2021, 6:30am

I see that you might be using backward hooks from here if debug is enabled.
Are you manipulating these tensors somehow? If not, are you seeing the issue without using the hooks? Would it work to detach the tensors before appending them to the global list (assuming that fits into your use case)?

promach · December 9, 2021, 1:12pm

If I comment out the backward hooks, then it gives the following different RuntimeError

WARNING: Logging before InitGoogleLogging() is written to STDERR
W20211209 21:10:14.966616  4122 python_anomaly_mode.cpp:102] Warning: Error detected in AvgPool2DBackward0. Traceback of forward call that caused the error:
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 755, in <module>
    ltrain = train_NN(forward_pass_only=0)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 400, in train_NN
    y2 = graph.cells[c].nodes[n].connections[cc].edges[e].forward_f(x2)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 106, in forward_f
    return self.f(x)
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/pooling.py", line 616, in forward
    return F.avg_pool2d(input, self.kernel_size, self.stride,
 (function _print_stack)
Traceback (most recent call last):
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 755, in <module>
    ltrain = train_NN(forward_pass_only=0)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 525, in train_NN
    Ltrain.backward()
  File "/usr/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 3, 32, 32]], which is output 0 of AddBackward0, is at version 24; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Process finished with exit code 1

ptrblck · December 9, 2021, 10:40pm

I guess you might be hitting the same issue as in your previous topic.

promach · December 10, 2021, 11:53am

This code commit solved the issue.