Modified CE Loss function

Hadjer13 · June 20, 2020, 10:55pm

I’m using a code from github for multi-class classification that uses a modified cross entropy loss function, that discards one class from the loss (first class).
Could someone please recognize the type of CE loss used ? and give me its name if it has one, or a resource to read about it ?

I could know that per_example_loss is a CE loss without the class ‘0’
Could someone tell me what ‘rc_loss’ refers to ?

probabilities = F.softmax(logits, dim=-1)
                log_probs = F.log_softmax(logits, dim=-1)
                
                print("labels",labels)
                one_hot_labels = F.one_hot(labels, num_classes=self.num_labels)
                if device >= 0:
                    one_hot_labels = one_hot_labels.to(device)
                
                print("hot labels", one_hot_labels)
                dist = one_hot_labels[:, 1:].float() * log_probs[:, 1:]
                print("hot labels except others", one_hot_labels[:, 1:].float())
                print("log prob except others", log_probs[:, 1:])

                example_loss_except_other, _ = dist.min(dim=-1)
                
                print("example loss except others", example_loss_except_other)
                per_example_loss = - example_loss_except_other.mean()

                rc_probabilities = probabilities - probabilities * one_hot_labels.float()
                print("rc_prob ",rc_probabilities )
                second_pre,  _ = rc_probabilities[:, 1:].max(dim=-1)
                print("seconde prob ",second_pre)
                rc_loss = - (1 - second_pre).log().mean()
                print("rc loss ",rc_loss)

                #print(loss, per_example_loss, rc_loss)
                loss += per_example_loss + 5 * rc_loss

Thanks in advance !

harsha_g · June 21, 2020, 4:08am

My impression of the 0th class you are referring to is the [CLS] token appended at the beginning of the sequence before feeding it to BERT. Also, I presume RC in rc_probabilities merely stands for Relation Classification. Did you come across this work
https://arxiv.org/pdf/1905.08284.pdf

while working on the code? This may have more information on what you are looking for.

Hadjer13 · June 21, 2020, 9:24am

I am actually using the implementation of the paper you mentioned,
From my understanding, per example loss gives the cross entropy loss without the class ‘Others’ of the dataset (I tried to print one_hot_labels[:, 1:] to make use that it exclude the class Others)
What I could’nt understand is, why did he add the rc_probabilities/rc_probabilities ? how will it affect the loss function ?
Is it a specific variant of the cross entropy loss ?
Thanks in advance

harsha_g · June 21, 2020, 11:59am

Okay, this took a while to figure out (although not completely). The problem being: the paper says one thing but the code does something else . Take a look at the other paper by the same authors (at the very bottom of the README.md in the link you posted) and could probably explain why they coded the loss function slightly different from the paper (in fact they call it the rank loss function). I hope this helps.

Hadjer13 · June 21, 2020, 12:43pm

YES this helps a lot thank you !
Indeed, it’s a ranking loss