I managed to fix it. For future reference, I was sloppy and did not properly reshape the bias term. Doing a transpose of the bias term is the one I forgot.
_s_bias = _s + bias.expand(bias_dim[0], _s.size()[0]).transpose(0,1)
Thank you for the wonderful effort that you’ve put in here, debugging is a lot easier.
BTW:
Can you explain this?
