Do i need to learn RNN and LSTMs prior to learn Transformer Model?
Well, strictly speaking, an LSTM is special instance of an RNN. So getting a basic grasp about the ideas behind and inner workings of RNNs would be enough.
In my opinion, it’s probably not really needed to learn about RNNs, but I would still recommended it – again, at least the basic concepts. Firstly, the limitations of RNNs – mainly the problem long-term dependencies and the lack of parallelism – are one of the goals to address in the original Transformer paper. And secondly, the concept of attention, as least as far as I know, has first been proposed (or at least popularized) by RNNs in this paper. The Transformer paper generalized the idea of attention, but I think the concept is a more intuitive in the context of RNNs.
If you just want to train a Transformer model, just go for it. For some deeper understanding, it’s always good to learn about (closely) related ideas to put things into better context.
@vdw Thank you for your suggestion