One Vs rest (multi-class classification) with Fully connected network and sigmoid PyTorch

Mohamed_Ragab · June 13, 2020, 8:45am

Hi, all I am trying to implement the one-vs-all classification scheme without using softmax. I have 10 classes. So, I did 10 binary classifications, but it doesn’t work properly and I don’t know why? I will be grateful and thankful if someone can help me with this. Here is a snapshot:

for i in range(num_classes):
                binary_cross_entropy =  nn.BCELoss() #
                tgt_class_label = torch.eq(labels, i).float()
                print(f"before_sigmoid: {pred_out[:,i]}")
                print(f"normalized: {pred[:,i]}")
                print(f"True Labels: {tgt_class_label}")
                print(tgt_class_label)
                class_loss = binary_cross_entropy(pred[:,i],tgt_class_label)
                Loss+=class_loss

Chirath_Pansilu · June 13, 2020, 6:30pm

It’s hard to tell from this snapshot what’s really going on . Can you provide at least the training procedure (or more) . It will be helpful .

Mohamed_Ragab · June 17, 2020, 1:05pm

@Chirath_Pansilu. This is the training part of the model

   for inputs, labels in train_dl:
        # send data and labels to the gpu
        src = inputs.to(device)
        labels = labels.to(device)
        # reset the gradients
        optimizer.zero_grad()
        # output predictions 
        pred_out,feas = model(src)
        #apply sigmoid to the output predictions of each class
        pred = sigmoid_fun(pred_out) 
        for i in range(num_classes):
                # define binary cross entropy loss
                binary_cross_entropy =  nn.BCELoss() #
                # make the ith class as 1 and all others as 0
                tgt_class_label = torch.eq(labels, i).float()
                # compute cross entropy between the target label and the ith class
                class_loss = binary_cross_entropy(pred[:,i],tgt_class_label)
                Loss+=class_loss
        # Loss = criterion(pred , labels)
        Loss.backward(retain_graph=True)
        optimizer.step()

Chirath_Pansilu · June 18, 2020, 6:31pm

Hey , It looks like you have got a theory part wrong. When you do One vs Rest classification you have to make, in this case, 10 different models as there are 10 different classes to distinguish between .

But you have only created a 1 model and calculating loss for different labels. That’s not how One vs Rest works. You have to create 10 different Models.And then train each model for each class label. One for 1st class another for 2nd class and so on. Then train each model separately . After that when you want to predict a outcome you have to run input via each model and pick the model with the highest probability.

Mohamed_Ragab · June 19, 2020, 9:17am

Thank you so much @Chirath_Pansilu for this clear explanation. Its now clear to me. But I have additional concern: in my architecture my model has two parts: CNN-feature extractor and Linear layer as classier. This classifier has 10 output neurons, I specified each neuron to predict the corresponding class. If this still not correct and I have to use single model can I freeze the CNN feature extractor and make 10 classifier models i.e., 1 for each class?

Chirath_Pansilu · June 20, 2020, 5:31am

Hi, I’m not sure I understand what you meant by this.

You said that you are doing one vs rest classification and you have mentioned above that there are 10 output neurons. This is a normal multi class classifier .

But if you want to train one vs rest classifiers, yes you can freeze CNN after it has trained one time because it is only used as a feature extractor.

If I have misunderstood something please tell.
Thank You

Mohamed_Ragab · June 20, 2020, 5:47am

Yes, you are right. Following your suggestion I have done the following, please see if I am doing that right:
1- I created a module list of linear layers equal to the number of classes, i.e. 10 linear layers
2- Each linear layer linear( feature_dim, 1)
3- I pass the features to all the modules, and calculate the loss with respect to each independently.
the model:

class one_vs_all(nn.Module):
    def __init__(self, input_dim=32,out_dim=1, num_classes=10, dropout=0):
        super(one_vs_all, self).__init__()
        # not the best model...
        self.input_dim = input_dim
        self.out_dim =out_dim
        self.sigmoid = nn.Sigmoid()
        self.num_classes=num_classes
        self.base_model= nn.Linear(self.input_dim,self.out_dim) 
        self.classifiers =  nn.ModuleList([self.base_model for i in range(self.self.num_classes)])
    def forward(self,class_index,x):
        predictions = self.classifiers[class_index](x)
        logits =self.sigmoid(predictions)
        return logits

and here is the training function

def train(model, train_dl, optimizer, criterion,config,device):
    model.train()
    epoch_loss = 0
    class_accuracy = {f'class{k+1}': [] for k in range(len(num_classes))}
    features =[];train_labels=[];total_pred= []
    Loss=0
    for inputs, labels in train_dl:
        src = inputs.to(device)
        labels = labels.to(device)
        #loss and score
        # one_vs_all
        for idx in range(num_classes):
            optimizer.zero_grad()
            tgt_class_label = torch.eq(labels, idx).float()
            logits        = model(idx,inputs)
            class_loss = binary_cross_entropy(logits,tgt_class_label)
            Loss+=class_loss
            class_accuracy[f'D{idx+1}'].append() = accuracy_score(tgt_class_label.cpu().detach().numpy(),pred.argmax(axis=1).cpu().detach().numpy())
            class_loss.backward()
            optimizer.step()
        epoch_loss += Loss.item()
        epoch_accuracy += Accuracy
        total_pred.append(pred.argmax(axis=1))
        features.append(feas)
        train_labels.append(labels)
    return epoch_loss / len(train_dl), epoch_accuracy/len(train_dl), torch.cat(features),torch.cat(total_pred),torch.cat(train_labels)```