Will nn.transformer take 3D masks?

I noticed that nn.multiheadattention does take a 3D mask of (N*num_heads, L, S) but that the transformer module only gives a 2D mask of (L,S) as a possible attention mask shape.

However, in its documentation, nn.transformer uses multiheadattention (“from .activation import MultiheadAttention”). Could I expect a transformer given a 3D mask to work properly?