[RNN] How do I get gradients of hidden state at each timestep wrt loss?

I know I can use the recurrent Cells to accomplish this; however, I’d love to know if this is possible to do with nn.RNN instead of nn.RNNCell. ReLU nonlinearity, which is the one I’m currently experimenting, has weird behaviors in nn.RNNCell (see my previous post here). Thus I’d like to avoid using nn.RNNCell + ReLU to examine the gradients of hidden states. I’ve tried all_hidden_states.retrain_grad() and torch.autograd.grad(loss, all_hidden_states, create_graph=True, retain_graph=True, allow_unused=True) but the previous command returns 'NoneType' object has no attribute 'clone' while the latter command returns nothing. Please advise, thanks!