What does increasing number of heads do in the Multi-head Attention?

@googlebot
Sorry you are correct, the pytorch implementation (following “attention is all you need paper”) will have the same paramaeter count regardless of num heads.

Just to note, there are other types of implementations of MultiHeadAttention where parameters amount scales with the number of heads.

Roy