One idea is to work with hot vectors instead. So basically, your last fc layer will be connected to 5 different outputs such that :
output 1 is a 6 element vector that represents the first label. If your label is 3 for example then you vector would be [0, 0, 0, 1, 0, 0]
output 2 is 8 element vector.
.
. and so on
Once you have these representations, you apply CrossEntropyLoss on each one of your outputs. And your final loss is the sum of all these 5 losses.
You can also weight your loss labels if you think one label is more important than the other.