Problem encountered during backprop: one of the variables needed for gradient computation has been modified by an inplace operation

Hi, I’m trying to finish assignment 4 given in the lecture EECS 498-007. However, there is a problem in my code that causes backprop failure.

Here is my code:

def rnn_step_forward(x, prev_h, Wx, Wh, b):
    """
    Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
    activation function.

    The input data has dimension D, the hidden state has dimension H, and we use
    a minibatch size of N.

    Inputs:
    - x: Input data for this timestep, of shape (N, D).
    - prev_h: Hidden state from previous timestep, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases, of shape (H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - cache: Tuple of values needed for the backward pass.
    """
    next_h = torch.tanh(x @ Wx + prev_h @ Wh + b)
    cache = (x, prev_h, Wx, Wh, b, next_h)
    return next_h, cache

def rnn_forward(x, h0, Wx, Wh, b):
    """
    Run a vanilla RNN forward on an entire sequence of data. We assume an input
    sequence composed of T vectors, each of dimension D. The RNN uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running
    the RNN forward, we return the hidden states for all timesteps.

    Inputs:
    - x: Input data for the entire timeseries, of shape (N, T, D).
    - h0: Initial hidden state, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases, of shape (H,)

    Returns a tuple of:
    - h: Hidden states for the entire timeseries, of shape (N, T, H).
    - cache: Values needed in the backward pass
    """
    N, T, D = x.size()
    _, H = h0.size()
    h = torch.zeros(N, T, H, dtype=torch.double, device='cuda')
    cache = []
    h[:, 0, :], c = rnn_step_forward(x[:, 0, :], h0, Wx, Wh, b)
    cache.append(c)
    for i in range(T - 1):
      h[:, i+1, :], c = rnn_step_forward(x[:, i+1, :], h[:, i, :], Wx, Wh, b)
      cache.append(c)
    return h, cache

N, D, T, H = 2, 3, 10, 5

# set requires_grad=True
x = torch.randn(N, T, D, **to_double_cuda, requires_grad=True)
h0 = torch.randn(N, H, **to_double_cuda, requires_grad=True)
Wx = torch.randn(D, H, **to_double_cuda, requires_grad=True)
Wh = torch.randn(H, H, **to_double_cuda, requires_grad=True)
b = torch.randn(H, **to_double_cuda, requires_grad=True)

out, cache = rnn_forward(x, h0, Wx, Wh, b)

dout = torch.randn(*out.shape, **to_double_cuda)
with torch.autograd.set_detect_anomaly(True):
  out.backward(dout) # the magic happens here!

And the error goes like this:

<ipython-input-34-2ebf82a736a2> in <module>
     20 # backward with autograd
     21 with torch.autograd.set_detect_anomaly(True):
---> 22   out.backward(dout) # the magic happens here!

E:\developer\Anaconda\envs\tensorflow\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
    183                 products. Defaults to ``False``.
    184         """
--> 185         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    186 
    187     def register_hook(self, hook):

E:\developer\Anaconda\envs\tensorflow\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    125     Variable._execution_engine.run_backward(
    126         tensors, grad_tensors, retain_graph, create_graph,
--> 127         allow_unreachable=True)  # allow_unreachable flag
    128 
    129 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.DoubleTensor [2, 5]], which is output 0 of SliceBackward, is at version 10; expected version 9 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

What’s going wrong and how should I debug this?

I modified my code to

    N, T, D = x.size()
    _, H = h0.size()
    h = torch.zeros(N, T, H, dtype=torch.double, device='cuda')
    cache = []
    tmp, c = rnn_step_forward(x[:, 0, :], h0, Wx, Wh, b)
    cache.append(c)
    h[:, 0, :] = tmp
    for i in range(T - 1):
      tmp, c = rnn_step_forward(x[:, i+1, :], tmp, Wx, Wh, b)
      h[:, i+1, :] = tmp
      cache.append(c)

and it works, but I still don’t know what’s the difference and what’s going wrong : (