Currently, I use nn.TransformerEncoder
to implement BERT.
An example of a BERT architecture:
encoder_layer = nn.TransformerEncoderLayer(d_model=embedding_size, nhead=num_heads)
bert = nn.Sequential(
nn.TransformerEncoder(encoder_layer, num_layers=num_encoder_layers),
nn.Linear(embedding_size, output_vocab_size)
)
How do I achieve the same using the nn.Transformer
API?
The doc says:
Users can build the BERT model with corresponding parameters.
Even if I set num_decoder_layers=0
while initializing it, the forward()
call mandatorily requires the argument tgt
for the transformer’s decoder, but BERT has no decoder.
So how do we go about it?