How to implement BERT using torch.nn.Transformer?

GokulNC · July 29, 2020, 3:38pm

Currently, I use nn.TransformerEncoder to implement BERT.
An example of a BERT architecture:

encoder_layer = nn.TransformerEncoderLayer(d_model=embedding_size, nhead=num_heads)
bert = nn.Sequential(
    nn.TransformerEncoder(encoder_layer, num_layers=num_encoder_layers),
    nn.Linear(embedding_size, output_vocab_size)
)

How do I achieve the same using the nn.Transformer API?

The doc says:

Users can build the BERT model with corresponding parameters.

Even if I set num_decoder_layers=0 while initializing it, the forward() call mandatorily requires the argument tgt for the transformer’s decoder, but BERT has no decoder.
So how do we go about it?

GokulNC · August 14, 2020, 5:03am

Bump… Anyone?

Note: Though I am aware of HuggingFace's BERT out-of-the-box, for simple non-NLP experiments using custom BERT-like small architectures, I think using PyTorch alone should suffice. PLMK if I’m wrong.

narain1 · March 2, 2021, 2:08pm

struggling with the same thing actually and still using the from scratch implementation help needed

AlbertZeyer · January 3, 2022, 10:00am

Why do you want to do that? Isn’t your initial code using TransformerEncoder simple enough?