I know I can use the recurrent Cells to accomplish this; however, I’d love to know if this is possible to do with nn.RNN
instead of nn.RNNCell
. ReLU nonlinearity, which is the one I’m currently experimenting, has weird behaviors in nn.RNNCell
(see my previous post here). Thus I’d like to avoid using nn.RNNCell
+ ReLU
to examine the gradients of hidden states. I’ve tried all_hidden_states.retrain_grad()
and torch.autograd.grad(loss, all_hidden_states, create_graph=True, retain_graph=True, allow_unused=True)
but the previous command returns 'NoneType' object has no attribute 'clone'
while the latter command returns nothing. Please advise, thanks!