Is this the right way to use Cross entropyloss

I have a classifier that has to predict whether a given sentence is positive or negative or Neutral.

This is my forward pass:

Sentiment_LSTM(
  (embedding): Embedding(19612, 400)
  (lstm): LSTM(400, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=512, out_features=3, bias=True)
)
Using the follwing loss function and learning rate:
lr=0.001

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

Here when I am doing back prop:

 output = net(inputs)
loss = criterion(output, labels)

Here my Labels for each example is jus 0 or 1 or 2. and output is the output of linear layer from the network(Size three for each example). Is this the right way to use this loss?

How do I find the best learning rate and optimizer?

Hi Vikas!

Yes. Your output, the first argument to CrossEntropyLoss, being
the output of your linear layer, are triples of values that range from
-infinity to infinity, and can be understood as logits (in contrast to
probabilities, that would range from zero to one). Your labels are
single values in {0, 1, 2}, that is, they are integer categorical labels.
This is what CrossEntropyLoss expects. (Internally, in effect,
CrossEntropyLoss converts the logits you pass it into probabilities
by, in effect, passing them through a softmax function.)

Experimentation. Too small a learning rate, and your network will
train slowly; too large, it will jump around or become unstable.

The Adam optimizer can be a good choice. I would also suggest
trying, at least as a baseline, the plain-vanilla SGD optimizer (again
with a variety of learning rates).

Good luck.

K. Frank

1 Like