Multi-label approach for emotion recoginition advice needed

Hello all,

Ive a data as below:

| clip | happy | sad | anger | surprise | disgust | fear | neutral |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | 0.44 | 0.0 | 0.0 | 0.0  | 0.0 | 0.0 | 0.0 | 
| 2 | 0.67 | 0.0 | 0.0 | 0.11 | 0.0 | 0.0 | 0.0 |
| 3 | 0.11 | 0.0 | 0.0 | 0.0  | 0.0 | 0.6 | 0.0 | 
| 4 | 0.44 | 0.0 | 0.0 | 0.11 | 0.0 | 0.0 | 0.0 | 
| 5 | 0.67 | 0.0 | 0.0 | 0.0  | 0.0 | 0.0 | 0.0 |

emotion range 0-1 ( data were normalized from original scores 0 - 3 )

each emotion was annotated by 3 judges after movie clip watched ( CMU-MOSEI ).
And as You can see multiple emotions can occur for a clip

Right now Im trying to solve this as regression problem where I crop 32 frames from each clip and perform MSEloss with AdamW

        outputs = model(inputs=inputs)
        logits = outputs.logits

        loss = F.mse_loss(logits, labels)
        loss.backward()
        optimizer.step()

But is there possibility to convert this problem into multilabel classification problem ? To perform BinaryCrossEntropy or CrossEntropy ?
Or any other to rather perform classification than regression

TY

Hi pretbc!

Because the score for each emotion ranges from zero to one (and there
is no constraint that the values sum to one), it is reasonable to treat each
emotion score as if it were a probability.

Because multiple emotions can occur for a single clip and the emotion
scores are (in some sense) independent of one another (in particular,
there is no constraint that they sum to one), treating your problem as
a multi-label classification could well be a good fit for your use case.

Treating this as a regression could also be a good fit, although regressing
values that are constrained to run from zero to one is not fully natural. One
sign of this is that there is nothing that prevents the predictions of such a
regression model from lying outside the zero-to-one range.

Yes, it would be perfectly reasonable to convert this into a multi-label
classification. Whether this will work better for your use case than a
regression is an empirical question – if you have the time, try both
approaches.

As it stands, all you need to do is change:

loss = F.mse_loss (logits, labels)

to:

loss = F.binary_cross_entropy_with_logits (logits, labels)

(where I assume that your outputs.logits run from -inf to inf and
would typically be the output of a final Linear layer with out_features
equal to 9 (your number of classes) and where that Linear layer is not
followed by any sigmoid() or other non-linear activation).

(You could also use the class form, BCEWithLogitsLoss if you prefer
that stylistically to the functional form.)

Best.

K. Frank

Thanks for Your answer KFrank

So after I did a switch to BCEWithLogits another question comes to my mind

How to evaluate this problem. I wrote metrics function:

from torchmetrics.classification import (
    MultilabelAccuracy, MultilabelAveragePrecision, MultilabelF1Score,
)

def multilabel_metrics(predictions: torch.Tensor, target: torch.Tensor) -> dict:
    """Calculate multilabel classification various metrics"""
    num_outputs = target.shape[1]
    # change targets to binary [0,1] for metrics 
    th_target = torch.as_tensor((target - 0.5) > 0, dtype=torch.int64)
    multilabel_acc = MultilabelAccuracy(num_labels=num_outputs, average='macro')
    multilabel_ap = MultilabelAveragePrecision(num_labels=num_outputs, average="macro")
    multilabel_f1 = MultilabelF1Score(num_labels=num_outputs, average='macro')

    return {
        'multilabel-acc': multilabel_acc(predictions, th_target).item(),
        'multilabel-ap': multilabel_ap(predictions, th_target).item(),
        'multilabel-f1': multilabel_f1(predictions, th_target).item(),
    }

and validation loop

    model.eval()
    test_loss = 0
    preds = torch.tensor([])
    targets = torch.tensor([])
    loop = tqdm(test_loader, leave=True)
    with torch.no_grad():
        for (inputs, labels) in loop:
            inputs = inputs.to(envi_builder.config.device)
            labels = labels.to(envi_builder.config.device)

            outputs = model(inputs=inputs)
            logits = outputs.logits

            # call binary_cross_entropy_with_logits (logits, labels)
            loss = call_task_loss(envi_builder.config.task)(logits, labels)
            test_loss += loss.item()
  
            #concat all preds and target to use later on metrics
            preds = torch.cat((preds, logits.detach().cpu()), 0)
            targets = torch.cat((targets, labels.detach().cpu()), 0)

    test_loss = test_loss / len(test_loader)
    metrics = call_task_metrics(envi_builder.config.task)(preds, targets)

and metrics seems to be very strange ( after first 5 epoch )

Metric multilabel-acc: 0.925
Metric multilabel-ap: 0.177
Metric multilabel-f1: 0.000

example:

from torchmetrics.classification import MultilabelAccuracy
target = tensor([[0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0]])
preds = tensor([[-0.8832, -1.3300, -1.2242, -1.1196, -0.6935, -0.7516, -0.4022], [-0.8940, -1.2300, -1.2766, -1.1840, -0.8486, -0.6994, -0.4034]])
metric = MultilabelF1Score(num_labels=7)
metric(preds, target)
>>tensor(0.9286)

--------------------

from torchmetrics.classification import MultilabelF1Score
target = tensor([[0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0]])
preds = tensor([[-0.8832, -1.3300, -1.2242, -1.1196, -0.6935, -0.7516, -0.4022], [-0.8940, -1.2300, -1.2766, -1.1840, -0.8486, -0.6994, -0.4034]])
metric = MultilabelF1Score(num_labels=7)
metric(preds, target)
>>tensor(0.)

Is there something that I missing ?

I see Your point rtwolfe94022

but still I wondering how to evaluate model

lets say I have emotion → label = tensor([[0.67, 0.0, 0.11, 0.0, 0.0, 0.0, 0.0]])

I than convert this into binary:

torch.as_tensor(label > 0, dtype=torch.float32)
>> tensor([[1., 0., 1., 0., 0., 0., 0.]])

and put into loss fun() # BCEwithlogits

next:
during metric calculation Im gonna convert logits into prob

sigmoid(preds)
tensor([[0.6789, 0.2092, 0.3643, 0.1256 0.3333, 0.3205, 0.4008],

so after MultilabelAccuracy th = 0.5 I will get ~ [1, 0, 0, 0, 0, 0, 0]

  1. I assume th of metric has to be changed ( th > 0 ) ?
  2. But that will only tell me that the emotion is there, but I won’t know its strength, right?