In this article: [NLP From Scratch: Translation with a Sequence to Sequence Network and Attention — PyTorch
The bahdanau implementation uses linear layers with bias, to my understanding they should be without bias. However, that’s as far as my understanding extends, what are the effects of including a bias layer?