Im trying to implement the transformer for NMT
so far, my code looks like this
class transformer(nn.Module):
def __init__(self,src_vocab_size, trg_vocab_size, emb_dim=512, m_dim = 512, d_model = 512):
super(transf, self).__init__()
self.transformer = nn.Transformer(d_model = d_model)
self.Embedding = nn.Embedding(src_vocab_size, emb_dim)
self.pos_encoder = PositionalEncoding(emb_dim)
self.fc_out = nn.Linear(m_dim,trg_vocab_size)
def forward(self, src,trg):
src = self.Embedding(src)
src = self.pos_encoder(src)
trg = self.Embedding(trg)
trg = self.pos_encoder(trg)
self.trg_mask = self.transformer.generate_square_subsequent_mask(trg.shape[0]).to(device)
out = self.transformer(src,trg, src_mask = None, tgt_mask = self.trg_mask)
out = self.fc_out(out)
return out
the input is
src = SRC_LENGTH, BATCH_SIZE
trg = TRG_LENGTH,BATCH_SIZE
but im getting a loss too small. I think is because of the attention mask in self.trg_mask = self.transformer.generate_square_subsequent_mask(trg.shape[0]).to(device)
but im not really sure.
What do you guys think?
Thanks
edit: Im thinking it’s because im creating just 1mask and i need BATCH_SIZE masks. Im trying with batch_size = 1