One of the variables needed for gradient computation has been modified

111137 · August 22, 2019, 7:36am

I try to use of eager execution having original nodes with parameters.
For training, I code with multiple iterations for training, so I did option of;

loss.backward(retain_graph = True)

Then I meet error of;

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-80ae8e772693> in <module>()
     19       y = model()
     20       loss = criterion(y, t)
---> 21       loss.backward(retain_graph = True)
     22       optimizer.step()

1 frames
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    105                 products. Defaults to ``False``.
    106         """
--> 107         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    108 
    109     def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [28, 128]] is at version 25088; expected version 21504 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Does this mean of that path for back-propagation is disconnected (cannot do it)?

111137 · August 22, 2019, 7:48am

Training code is as follows;

model.train()
  
for epoch in range(EPOCH):
  for x, t in dataloader_train:
    t = t.to(device)
    for time in range(TIME_STEPS):

      x_ = x[0][0][time]
      x_ = x_.to(device)
      for index_a in range(NUM_INPUT):
        for index_b in range(NUM_HIDDEN-1, 0, -1):
          model.fw_x[index_a][index_b] = model.fw_x[index_a][index_b - 1]
          model.fw_h[index_a][index_b] = model.fw_h[index_a][index_b - 1]
      
        model.fw_x[index_a][0:NUM_INPUT] = x_
        model.fw_h[index_a][0] = 0.0

      model.zero_grad()
      y = model()
      loss = criterion(y, t)
      loss.backward(retain_graph = True)
      optimizer.step()

Is such the FIFO coding not allowed? Then how to write same function with PyTorch
grammer?

albanD · August 22, 2019, 11:36am

Hi,

This writing Tensors inplace can be problematic.
If you can use lists instead, it will solve the problem.
Otherwise, you need to avoid problematic inplace operations either by creating new Tensors every time or adding a clone() of fw_x/fw_h after all your inplace ops.

111137 · August 22, 2019, 7:26pm

Hi,

I try this style;

model.fw_x = torch.stack((model.fw_x[1:], x_))

where “x_” and “fw_x” are one and two dimension, repectively.
torch.cat() needs same shape between them, so instead I used the stack with “[1:]” in order to make FIFO function, but I meet error of;

RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:702

I still do not understand. Any suggestion?