Question about src_mask in TransformerEncoder

I’m trying to train a Transformer model similar of how BERT was trained, where elements of the input sequence are masked randomly. By reading the docs, I found that the expected shape of src_mask is (N*num_heads, S, S). I suppose that the batch size is multiplied by the number of heads because the model will use a different mask for each head (is that correct?), so I’m just repeating the mask for each head. I’m using the following code to generate the masks:

num_heads = 8
batch_size = 32
seq_len = 50
prob = 0.3

src_mask = torch.rand(batch_size, seq_len, 1) < prob  # generate random mask for each sequence (N, S, 1)
src_mask = src_mask.repeat(1, 1, seq_len)  # make a square matrix for each mini-batch (N, S, S)
src_mask = src_mask.repeat_interleave(num_heads, dim=0)  # repeat for each head (N*num_heads, S, S)

Is this approach correct? Thanks.