Using the follwing loss function and learning rate:
lr=0.001
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
Here when I am doing back prop:
output = net(inputs)
loss = criterion(output, labels)
Here my Labels for each example is jus 0 or 1 or 2. and output is the output of linear layer from the network(Size three for each example). Is this the right way to use this loss?
How do I find the best learning rate and optimizer?
Yes. Your output, the first argument to CrossEntropyLoss, being
the output of your linear layer, are triples of values that range from
-infinity to infinity, and can be understood as logits (in contrast to
probabilities, that would range from zero to one). Your labels are
single values in {0, 1, 2}, that is, they are integer categorical labels.
This is what CrossEntropyLoss expects. (Internally, in effect, CrossEntropyLoss converts the logits you pass it into probabilities
by, in effect, passing them through a softmax function.)
Experimentation. Too small a learning rate, and your network will
train slowly; too large, it will jump around or become unstable.
The Adam optimizer can be a good choice. I would also suggest
trying, at least as a baseline, the plain-vanilla SGD optimizer (again
with a variety of learning rates).