Hi, I’m trying to finish assignment 4 given in the lecture EECS 498-007. However, there is a problem in my code that causes backprop failure.
Here is my code:
def rnn_step_forward(x, prev_h, Wx, Wh, b):
"""
Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
activation function.
The input data has dimension D, the hidden state has dimension H, and we use
a minibatch size of N.
Inputs:
- x: Input data for this timestep, of shape (N, D).
- prev_h: Hidden state from previous timestep, of shape (N, H)
- Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
- Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
- b: Biases, of shape (H,)
Returns a tuple of:
- next_h: Next hidden state, of shape (N, H)
- cache: Tuple of values needed for the backward pass.
"""
next_h = torch.tanh(x @ Wx + prev_h @ Wh + b)
cache = (x, prev_h, Wx, Wh, b, next_h)
return next_h, cache
def rnn_forward(x, h0, Wx, Wh, b):
"""
Run a vanilla RNN forward on an entire sequence of data. We assume an input
sequence composed of T vectors, each of dimension D. The RNN uses a hidden
size of H, and we work over a minibatch containing N sequences. After running
the RNN forward, we return the hidden states for all timesteps.
Inputs:
- x: Input data for the entire timeseries, of shape (N, T, D).
- h0: Initial hidden state, of shape (N, H)
- Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
- Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
- b: Biases, of shape (H,)
Returns a tuple of:
- h: Hidden states for the entire timeseries, of shape (N, T, H).
- cache: Values needed in the backward pass
"""
N, T, D = x.size()
_, H = h0.size()
h = torch.zeros(N, T, H, dtype=torch.double, device='cuda')
cache = []
h[:, 0, :], c = rnn_step_forward(x[:, 0, :], h0, Wx, Wh, b)
cache.append(c)
for i in range(T - 1):
h[:, i+1, :], c = rnn_step_forward(x[:, i+1, :], h[:, i, :], Wx, Wh, b)
cache.append(c)
return h, cache
N, D, T, H = 2, 3, 10, 5
# set requires_grad=True
x = torch.randn(N, T, D, **to_double_cuda, requires_grad=True)
h0 = torch.randn(N, H, **to_double_cuda, requires_grad=True)
Wx = torch.randn(D, H, **to_double_cuda, requires_grad=True)
Wh = torch.randn(H, H, **to_double_cuda, requires_grad=True)
b = torch.randn(H, **to_double_cuda, requires_grad=True)
out, cache = rnn_forward(x, h0, Wx, Wh, b)
dout = torch.randn(*out.shape, **to_double_cuda)
with torch.autograd.set_detect_anomaly(True):
out.backward(dout) # the magic happens here!
And the error goes like this:
<ipython-input-34-2ebf82a736a2> in <module>
20 # backward with autograd
21 with torch.autograd.set_detect_anomaly(True):
---> 22 out.backward(dout) # the magic happens here!
E:\developer\Anaconda\envs\tensorflow\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
183 products. Defaults to ``False``.
184 """
--> 185 torch.autograd.backward(self, gradient, retain_graph, create_graph)
186
187 def register_hook(self, hook):
E:\developer\Anaconda\envs\tensorflow\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
125 Variable._execution_engine.run_backward(
126 tensors, grad_tensors, retain_graph, create_graph,
--> 127 allow_unreachable=True) # allow_unreachable flag
128
129
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.DoubleTensor [2, 5]], which is output 0 of SliceBackward, is at version 10; expected version 9 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
What’s going wrong and how should I debug this?