How to handle not measured classes

Sorry, the argument should be weight for CEL. See here:

https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

The weight should be based on your targets/labels. For example, let’s suppose you have only half the training samples for class 0 as other classes, you’d set that value to 2 in the weight tensor, while the others would be 1. The weight is just a tensor of multipliers to get the classes equal. It should be the same length as your number of classes. It’s only necessary if your training samples have a significant class imbalance.

One more possible strategy you can try is initializing the biases in your layers to 1s. For example, you could add something like this in your init after making the layers:

...
self.layer1  = nn.Linear(3, 64)
self.layer1.bias = nn.Parameter(torch.ones_like(self.layer1.bias))
...

Correct. But in another thread, adding a softmax activation on the outputs before CEL provided consistently better inference on randomly initialized weights, whereas without would sometimes overfit to the targets and sometimes not. I have admittedly not dug in to the reasons why that was the case, but could be worth trying if other suggestions aren’t working. Here is that thread(with a reproducible code snippet):