How to implement BERT using torch.nn.Transformer?

Currently, I use nn.TransformerEncoder to implement BERT.
An example of a BERT architecture:

encoder_layer = nn.TransformerEncoderLayer(d_model=embedding_size, nhead=num_heads)
bert = nn.Sequential(
    nn.TransformerEncoder(encoder_layer, num_layers=num_encoder_layers),
    nn.Linear(embedding_size, output_vocab_size)

How do I achieve the same using the nn.Transformer API?

The doc says:

Users can build the BERT model with corresponding parameters.

Even if I set num_decoder_layers=0 while initializing it, the forward() call mandatorily requires the argument tgt for the transformer’s decoder, but BERT has no decoder.
So how do we go about it?

Bump… Anyone?

Note: Though I am aware of HuggingFace's BERT out-of-the-box, for simple non-NLP experiments using custom BERT-like small architectures, I think using PyTorch alone should suffice. PLMK if I’m wrong.