Modified CE Loss function

I’m using a code from github for multi-class classification that uses a modified cross entropy loss function, that discards one class from the loss (first class).
Could someone please recognize the type of CE loss used ? and give me its name if it has one, or a resource to read about it ?

I could know that per_example_loss is a CE loss without the class ‘0’
Could someone tell me what ‘rc_loss’ refers to ?

probabilities = F.softmax(logits, dim=-1)
                log_probs = F.log_softmax(logits, dim=-1)
                one_hot_labels = F.one_hot(labels, num_classes=self.num_labels)
                if device >= 0:
                    one_hot_labels =
                print("hot labels", one_hot_labels)
                dist = one_hot_labels[:, 1:].float() * log_probs[:, 1:]
                print("hot labels except others", one_hot_labels[:, 1:].float())
                print("log prob except others", log_probs[:, 1:])

                example_loss_except_other, _ = dist.min(dim=-1)
                print("example loss except others", example_loss_except_other)
                per_example_loss = - example_loss_except_other.mean()

                rc_probabilities = probabilities - probabilities * one_hot_labels.float()
                print("rc_prob ",rc_probabilities )
                second_pre,  _ = rc_probabilities[:, 1:].max(dim=-1)
                print("seconde prob ",second_pre)
                rc_loss = - (1 - second_pre).log().mean()
                print("rc loss ",rc_loss)

                #print(loss, per_example_loss, rc_loss)
                loss += per_example_loss + 5 * rc_loss

Thanks in advance !

My impression of the 0th class you are referring to is the [CLS] token appended at the beginning of the sequence before feeding it to BERT. Also, I presume RC in rc_probabilities merely stands for Relation Classification. Did you come across this work

while working on the code? This may have more information on what you are looking for.

I am actually using the implementation of the paper you mentioned,
From my understanding, per example loss gives the cross entropy loss without the class ‘Others’ of the dataset (I tried to print one_hot_labels[:, 1:] to make use that it exclude the class Others)
What I could’nt understand is, why did he add the rc_probabilities/rc_probabilities ? how will it affect the loss function ?
Is it a specific variant of the cross entropy loss ?
Thanks in advance

Okay, this took a while to figure out (although not completely). The problem being: the paper says one thing but the code does something else :wink:. Take a look at the other paper by the same authors (at the very bottom of the in the link you posted) and could probably explain why they coded the loss function slightly different from the paper (in fact they call it the rank loss function). I hope this helps.

YES this helps a lot thank you !
Indeed, it’s a ranking loss