I think the forward function uses C++ only when it handles tensors. Since the forward function defines the path of the tensors through the nn it handles everything using C++. Besides using C++ for nn computations, everything you need is in python to design the flow through the nn(s) which is in the docs. I got an LSTM network running in pytorch by following the Pytorch tutorial example for sentiment analysis but it took a lot of hacking at it to make it work on anything besides word analysis. It still outputs a prediction for every time step instead of each batch which is different than what I’m used to in keras but now I’m almost certain that it has to do with the forward function but I still can’t find any information for common practices around handling the forward function. One thing that was preventing it from working was making sure the hidden state and cell were grabbed and passed to the lstm correctly.