Attention weights with multiple heads in nn.MultiheadAttention?

I’m confused how attn_output_weights is specified to have shape (N, L, S) regardless of the number of heads. Wouldn’t there be a unique set of weights for each head?

https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html

This seems to be because the attention weights are averaged across all of the heads: