Single attention head raises error in evaluation procedure of transformer

RuntimeError: Only support when num_heads is even in transformer

But there is only one head in my transformer layer
I met the this error when i am using v2.0. Unfortunately, my system strictly needs num_heads = 1

  • It is strange, because when nn.Module.training is set to True(i’m training my model), no error occurred.
  • When I apply .eval() for non-dropout model evaluation, error returns : RuntimeError: Only support when num_heads is even in transformer.
  • More strangely, when I abort padding_mask input, this error disappears and model can work. However, for attention model, I really need mask!
#this would avoid error, but is not a wise choice.
if self.training:
    encode_X = self.transformer_encoder(X,src_key_padding_mask=mask)
else:
    encode_X = self.transformer_encoder(X)
return encode_X

My current torch version is latest 2.0, i found similar error report for 1.12 (maybe training has been repaired but eval hasn’t , I guess)