Apologies for the long post. I am very new to pytorch. And really confused about crossentropy loss and nll loss in pytorch.
I am trying to use softmax on a sequence to sequence problem. I have targets of two types. First target has number of tokens of 13 and second one has 100. The model has two outputs and i am computing loss for each type and sum them. I have the model output shape [seq_length, number of tokens, embedding dim]
and output shape as [seq_length, number of tokens]
. For nll_loss i am using log softmax in output dimension 1 then passing output to loss function by transposing last two dimensions like [seq_length, number of tokens, embedding dim]
. Also in crossentropy i am passing raw logits of shape [seq_length, number of tokens]
. But each case loss just fluctuates and does not decrease. What I am doing wrong here.
A short description of full process is as follows.
Inputs are a sequence of tuples like [(1,15), (2,27), (13,10)]
. First i one hot encoded each token in each tuple. Then embed them so that first elements of each tuple have shape [10, embed dim]
, then add the two tensors of each tuple to create one embedding for each tuple. finally stack these to create the input of shape [3, 110, embed dim]
And the two output logits have shape [3, 10, embedding dim]
and [3, 100, embedding dim]
. And if the target is ([2,27), (12,10), (5, 90)]
. They are onehot encoded separately so the target 1 has shape [3, 10]
and target shape is [3,100]'