VanillaRNN without activation function

By the time you remove the activation function, isn’t the RNN cell just a linear model (with input torch.cat([inp_t, h_t_minus_1], -1) and output h_t) applied at each timestep?
I would expect the speed penalty to be small. For a much more complex model, LLTM, the C++ tutorial shows a speedup (reduction in running time) of 30% for Python+Autograd vs. implementing a forward + custom backward in C++, and my analysis seems split this into ~10%pts “move to C++” and ~20%pts “custom backward”. But again, this is for a much more complex model, I would expect the benefits in your case to be considerably more marginal.

Best regards

Thomas