I’m trying to implement the Neural-expectation maximisation architecture in pytorch.
Unfortunately, such a model requires a VanillaRNN without any activation function (or a sigmoid activation function). Currently, the pytorch._backend library seems to support only RNNs with tanh or ReLU activations. However, the RNNBase module is not documented but appears to support different execution modes. I’m not sure if such modes are just LSTM, GRU, RNN_tanh and RNN_ReLU.
Does anyone know if calling the RNNBase module with an empty mode works and in case what should I expect as output?
If I have to implement a simple RNN without activation in python-native, does anyone know how slow would it be w.r.t. the C implementation?
By the time you remove the activation function, isn’t the RNN cell just a linear model (with input
torch.cat([inp_t, h_t_minus_1], -1) and output h_t) applied at each timestep?
I would expect the speed penalty to be small. For a much more complex model, LLTM, the C++ tutorial shows a speedup (reduction in running time) of 30% for Python+Autograd vs. implementing a forward + custom backward in C++, and my analysis seems split this into ~10%pts “move to C++” and ~20%pts “custom backward”. But again, this is for a much more complex model, I would expect the benefits in your case to be considerably more marginal.
Thanks for the answer,
yes you are right, if you remove the activation_fn an RNN became almost a linear model.
I’m finishing the implementation, not sure how slow it would be. I hope it would be decent.