Multi-class classification

Rojin · June 10, 2019, 5:53pm

I am trying to do a multi-class classification in pytorch. The code runs fine, but the accuracy is not good. I was wondering if my code is correct?
The input to the model is a matrix of 2000x100 and the output is a 1D tensor with the index of the label ex: tensor([2,5,31,…,7]) => 2000 elements

# another multi-class classification
class MultiClass(nn.Module):
    
    def __init__(self, x_dim, z_dim):
        super(MultiClass, self).__init__()

        self.cf1  = nn.Linear(z_dim, z_dim)
        self.cf1N = nn.BatchNorm1d(num_features = z_dim)
        self.cf1D = nn.Dropout(0.5) 
        
        self.cf2  = nn.Linear(z_dim, z_dim)
        self.cf2N = nn.BatchNorm1d(num_features = z_dim)
        self.cf2D = nn.Dropout(0.5)
        self.cf3  = nn.Linear(z_dim, 32)

        
    def classifier(self,z):
        return F.softmax((self.cf3(self.cf2D(F.relu(self.cf2N(self.cf2(self.cf1D(F.relu(self.cf1N(self.cf1(z)))))))))))
            

    def forward(self, z):
        return self.classifier(z)
    

def MultiClass_loss_function(out, label):

    secLoss = F.cross_entropy(out, label.long())

    return secLoss

tymokvo · June 10, 2019, 6:32pm

You might consider inheriting from the Sequential module and using a layer list to make your module a bit easier to reason about:

(I have nn.functional imported as nnf)

class MultiClass(nn.Sequential):
    
    def __init__(self, z_dim):
        super(MultiClass, self).__init__()

        layers = [
            ("cf1" , nn.Linear(z_dim, z_dim)),
            ("cf1N", nn.BatchNorm1d(num_features = z_dim)),
            ("r1"  , nn.ReLU()),
            ("cf1D", nn.Dropout(0.5) ),
            ("cf2" , nn.Linear(z_dim, z_dim)),
            ("cf2N", nn.BatchNorm1d(num_features = z_dim)),
            ("r2"  , nn.ReLU()),
            ("cf2D", nn.Dropout(0.5)),
            ("cf3" , nn.Linear(z_dim, 32)),
        ]
        
        [self.add_module(name, layer) for (name, layer) in layers]

        
    def classifier(self,z):
        return nnf.softmax(self(z))

x_dim also does not seem to be used anywhere?

MultiClass(
  (cf1): Linear(in_features=2000, out_features=2000, bias=True)
  (cf1N): BatchNorm1d(2000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (cf1D): Dropout(p=0.5)
  (cf2): Linear(in_features=2000, out_features=2000, bias=True)
  (cf2N): BatchNorm1d(2000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (cf2D): Dropout(p=0.5)
  (cf3): Linear(in_features=2000, out_features=32, bias=True)
)

How is your data structured?

Most classification tasks are accomplished by predicting probabilities over a tensor with the rank of the number of classes rather than predicting the index of the class directly. E.g. ImageNet has 1000 categories so a label for gold fish would have the value 1.0 at the index 1 on a rank 1000 tensor.

Sorry if I’m telling you what you already know! But some more detail about the code/task would be helpful.

Another thing to remember:

The domain of the softmax function is [0, 1]. So the result of your .classifier() method on your example label would be something like:

>>> nnf.softmax(torch.tensor([2, 5, 31, 7]).float())
tensor([2.5437e-13, 5.1091e-12, 1.0000e+00, 3.7751e-11])

Oli · June 10, 2019, 9:04pm

Heeello,

I agree with @tymokvo that more background into the problem helps (or an explanation on why this isn’t needed).

Luckily I found one possible error in your code. Your forward/classifier function squeezes the output into a softmax whilst your loss function, cross_entropy, doesn’t want this. See the docs about cross entropy and NLLLoss

tymokvo · June 11, 2019, 12:56am

I totally missed the loss function using F.cross_entropy at the bottom of the code snippet. Sorry about that! Disregard pedantic note about label shapes above.

I do think the function is a bit confusing in this case. The docs say:

input has to be a Tensor of size either (minibatch,C)(minibatch, C)(minibatch,C) or (minibatch,C,d1,d2,…,dK)(minibatch, C, d_1, d_2, …, d_K)(minibatch,C,d1,d2,…,dK) with K≥1K \geq 1K≥1 for the K-dimensional case (described later).

and

This criterion expects a class index in the range [0,C−1][0, C-1][0,C−1] as the target for each value of a 1D tensor of size minibatch;

So the network is still supposed to be returning probabilities over a dimension with rank = (your number of classes). As opposed to what you described:

But the label is as you describe, a tensor of integer indices, where each entry is the integer class label that corresponds to the ground truth for the matching input.

Since the shapes all seem to match (32 on the output), it sounds as @Oli mentioned an activation function problem.

Rojin · June 11, 2019, 4:35am

@Oli @tymokvo

Thanks for the replies, I removed the softmax layer, not sure if that is the right thing to do because I know that softmax is used for multi-class classification.

Basically I am trying to build a super simple multi-class classification in pytorch! I have done this in Keras easily but I’m not sure what I’m doing wrong here.

Here is the new model, and I am sure it is wrong because the prediction is always the same.

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.relu(out)
        out = self.layer2(out)
        return out
    

train_x = Variable(torch.from_numpy(np.asanyarray(x_train[x_train.columns[:-1]]))).float()
train_y = Variable(torch.LongTensor(x_train_label)).long()

test_x = Variable(torch.from_numpy(np.asanyarray(x_test[x_test.columns[:-1]]))).float()
test_y = Variable(torch.LongTensor(x_test_label)).long()

The input and label are like this:

train_x, x_train_label

(tensor([[ 4.3734, 14.7227, 14.0051,  ..., 10.8181, 10.1554, 10.5231],
         [ 1.3698, 16.1158, 16.2395,  ..., 10.0263, 10.1859, 11.0192],
         [ 5.4781, 13.8524, 13.4969,  ..., 10.2149, 10.1836, 11.0694],
         ...,
         [ 0.9918, 15.1710, 12.5128,  ...,  9.5816,  9.3345, 10.6840],
         [ 0.0000, 11.7931, 11.7571,  ...,  9.9917, 10.2152, 11.6365],
         [ 0.8513, 13.1537, 11.8680,  ...,  9.6107, 10.6265, 10.2632]]),
 tensor([28, 22, 19,  ..., 26, 29,  5]))

tr_latent_X = data_utils.TensorDataset(train_x, train_y)
te_latent_X = data_utils.TensorDataset(test_x,  test_y)

train_loader_X = torch.utils.data.DataLoader(dataset=tr_latent_X,
                                             batch_size=bs,
                                             shuffle=False)

test_loader_X = torch.utils.data.DataLoader(dataset=tr_latent_X,
                                             batch_size=bs,
                                             shuffle=False)

vae = NeuralNet(5000 ,1000, 32)

loss_function = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(vae.parameters(), lr = 0.02)

def train(epoch):
    vae.train()

    train_loss = 0

    for batch_idx, (data,label) in enumerate(train_loader_X):

        optimizer.zero_grad()

        out = vae(data)
        loss = loss_function(out, label)

        loss.backward()
        train_loss += loss.item()
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader_X.dataset),
                100. * batch_idx / len(train_loader_X), loss.item() / len(data)))
    print('====> Epoch: {} Average loss: {:.4f}'.format(epoch, train_loss / len(train_loader_X.dataset)))

for epoch in range(1, 10):
    train(epoch)

Train Epoch: 1 [0/6483 (0%)]	Loss: 0.093940
Train Epoch: 1 [6400/6483 (98%)]	Loss: 0.051790
====> Epoch: 1 Average loss: 3.8580
Train Epoch: 2 [0/6483 (0%)]	Loss: 0.052003
Train Epoch: 2 [6400/6483 (98%)]	Loss: 0.051288
====> Epoch: 2 Average loss: 0.0511
Train Epoch: 3 [0/6483 (0%)]	Loss: 0.051706
Train Epoch: 3 [6400/6483 (98%)]	Loss: 0.051227
====> Epoch: 3 Average loss: 0.0508
Train Epoch: 4 [0/6483 (0%)]	Loss: 0.051529
Train Epoch: 4 [6400/6483 (98%)]	Loss: 0.051222
====> Epoch: 4 Average loss: 0.0507
Train Epoch: 5 [0/6483 (0%)]	Loss: 0.051466
Train Epoch: 5 [6400/6483 (98%)]	Loss: 0.051220
====> Epoch: 5 Average loss: 0.0507
Train Epoch: 6 [0/6483 (0%)]	Loss: 0.051450
Train Epoch: 6 [6400/6483 (98%)]	Loss: 0.051218
====> Epoch: 6 Average loss: 0.0507
Train Epoch: 7 [0/6483 (0%)]	Loss: 0.051448
Train Epoch: 7 [6400/6483 (98%)]	Loss: 0.051215
====> Epoch: 7 Average loss: 0.0507
Train Epoch: 8 [0/6483 (0%)]	Loss: 0.051449
Train Epoch: 8 [6400/6483 (98%)]	Loss: 0.051212
====> Epoch: 8 Average loss: 0.0507
Train Epoch: 9 [0/6483 (0%)]	Loss: 0.051451
Train Epoch: 9 [6400/6483 (98%)]	Loss: 0.051210
====> Epoch: 9 Average loss: 0.0507

out = []
k = []

for data,label in test_loader_X:
    out.append(vae(data))
    k.append(label)

and here is how the prediction on the test looks like:

[tensor([[ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         ...,
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857]],
        grad_fn=<AddmmBackward>),
 tensor([[ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         ...,
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857],
         [ 0.3506,  0.4214,  0.6405,  ..., -0.5726,  0.3580, -1.2857]],
        grad_fn=<AddmmBackward>),

god_sp33d · June 11, 2019, 5:33am

The network and the flow seems ok to me. Can you try the following things?

Reduce the learning rate to 0.0005 or so and see the result ?
Why don’t you use a simple optimizer initially like SGD and see the output ?
Can you provide more information about the input data, I see values ranging from 0 - 12 (may be) did you perform any normalization before passing the data to the neural network ?

Rojin · June 11, 2019, 5:38am

I switched the optimizer to SGD, the result is still the same. the learning rate is 0.0001.

The input data is log2(gene expression matrix)

Do you thing that I need softmax layer at the end?

god_sp33d · June 11, 2019, 5:47am

No, cross entropy loss has softmax built in it.

I am not aware of gene expression data but may be normalizing the data to vary between 0 - 1 might help.

Oli · June 11, 2019, 7:23am

Hi Rojin. So you shouldn’t always use a softmax layer in your model since you will apply it once in your loss function. However, when you are doing your testing, your loss function isn’t used. That’s why you have to apply softmax “manually”. I suggest that you do a predict function in your model that looks something like below. (Could be a typo or something in there, never tested it)

def predict(self, x):
  with torch.no_grad():
    outp = self.forward(x)
    return F.softmax(outp)