Multi label activation function

Julio · May 28, 2020, 1:35pm

I currently have a trained system with a Softmax as the activation function, which returns a vector of probabilities of each class, all together suming 1 (example: [0.6. 0.1, 0.15, 0.05, 0.1])
I was using this model to perform a multi class classification, but i’d like to try it with a multi label approach, and i was wondering if exists any activation function that returns a number of the probability of each class being correct, independently of the others (not suming 1 necessarily. example: [0.8, 0.2, 0.1, 0.7, 0.15])

Many thanks

KFrank · May 28, 2020, 3:42pm

Hi Julio!

sigmoid() should do what you want.

A couple of points:

I imagine that you have a Linear layer feeding your activation function.
The Linear layer produces a set raw-score logits. softmax() converts
these to a set class probabilities that sum to one. sigmoid() converts
each logit individually to the (binary) probability of the corresponding
class be “active” or “inactive.”

No (built-in) loss function can be used directly with softmax(). For
the single-label, multi-class case, you should either feed the logits to
CrossEntropyLoss, or use log_softmax() as the activation function,
and pass its output to the NLLLoss loss function.

For the multi-label, multi-class case, you can feed sigmoid() to
BCELoss, but you will be better off, for numerical reasons feeding the
logits to BCEWithLogitsLoss.

I wouldn’t just take your trained single-label model and replace
softmax() with sigmoid() to make multi-label predictions. You
can start with your trained single-label model, and use it as a
pre-trained starting point for further training of your multi-label
model.

Best.

K. Frank