I am trying to replicate this paper.
Since I am using cross-entropy, I won’t add the sigmoid function when training.
The input size of the lstm is 25088 but I am not sure what hidden_size and lstm_layers to use in order to be able to use cross-entropy.
nn.CrossEntropyLoss
expects a model output in the shape [batch_size, nb_classes]
so you would have to make sure your model creates such outputs containing logits.
Depending on the model architecture (and if the LSTM
is the last layer), the hidden_size
could define the number of features and could thus match the number of classes. However, if the LSTM output is passed to another layer (e.g. nn.Linear
) then this layer would have to output the class logits.
Thank you, I’ve used another nn.Linear layer and fixed that.