Here is the code that attempts to learn to encode for paragraphs (the set of sentences that are themselves encoded).
class TransformerEnc(nn.Module):
def __init__(self):
super(TransformerEnc, self).__init__()
d_model = 1024
self.posenc = PositionalEncoding(d_model)
encoder_layer = nn.TransformerEncoderLayer(d_model, nhead=8)
self.model = nn.TransformerEncoder(encoder_layer, num_layers=6)
def forward(self, x):
x = self.posenc(x)
output = self.model(x)
output = torch.sum(output, 1)
return output
forward
function gets tensor x
of shape [64, 20, 1024]
.
Where 20 is the max number of sentences in a paragraph and 1024 is encoded sentence.
Now I need the output
of the size [64, 1024]
where 20 sentences got embedded in one 1024 dim vector.
When I train a model with this embedding I get a random accuracy. If I replace this code with LSTM it trains well. So clearly there is some issue.
Any ideas what might be the problem?