Train multi-class dataset NN with one custom embedding layer and one single linear layer

Nikos_Spanos · December 20, 2021, 1:47pm

I have a training_iterator that looks like:

batch.TEXT = (tensor([[   0,    3,   31, 2104,    0,  102,  847,  538,   23,  974],
         [   0,   24,  577, 4782,   16, 1191,  296,  385,  396,    0],
         [  21,    4,    8,    5,  214, 3549, 2079,    0,  982, 2246],
         [ 116,  346,   12,   30,   89,   38,   45,  491,    5,    7],
         [   3,   69,   56, 1372, 1704,  124, 3906,  530,   81,    0],
         [ 122,   97,  771,   10,   67, 1986,  945,  236,    0,   20],
         [   3,   17,  423,   35,    4,    0,  387,  285,  251,   37],
         [1524,  322, 4511,   28,   30,   12, 1199,  288, 1129,    3],
         [ 398, 2070, 2646,  113,  201,    2, 1748,   20,   72, 1525],
         [  62,  301,  929,    2, 1149, 2092,  524,   20,  286, 2425],
         [ 102, 1722, 1865,  123, 3541,    8,  163,    2,   54, 2688],
         [ 762,    0,   11, 2367,  276,    0,   10, 4082,    9,  182],
         [2966,  434, 2187,  704, 2247,   16,    0,    0,    0,    9],
         [ 543,  699,  543,  699,  326,  699,  692,    0,   16,    9],
         [1790,  391,    3,  690,   18,   76,    4,  264,  499, 3703],
         [   3,   17,   32,  776,   71,   92,  158, 1818,    5,    7],
         [ 416,   11,  811,  194, 1034,    0,  642, 2010, 4232, 4232],
         [ 164,  178,   91, 1048,  279,    0, 1886,  748,  162,    2],
         [ 182,   20,  443,  697,   55, 3742,  229,   28, 1793, 1586],
         [2526, 4904, 2238,    3,   28,   30,    0,  311,  729,    0],
         [   2,  188,  616,  192,  697, 1493, 2161,    3,  394,   48],
         [ 701,    3,   17,    2,  823,  168,  130, 1165, 1554,    3],
         [ 684,  373,  721,  374,  366,    8,  649,   77,  214,   28],
         [  73,  232,  221,  309,    3,   73,  232,  221,  309,    3],
         [3875,  141,  638,    0, 2788,    0,  387,   20,  159,    2],
         [   4, 1261,    2,  425, 1166,   39,   49,   13, 1870,    0],
         [   3,   17,   15, 1698,   60, 1084, 1235,  368,  122,   30],
         [  81,  450,  388,   11,  211,  246, 1022, 1601,  548, 4300],
         [   0,  396,   38,   23,    4, 1251,    0,  193, 2721,   16],
         [ 785,    5,  140,    8, 2688,    2,   52,  107,    5,    7],
         [1574, 1430,   11,   44,  211,  246,  548, 3712,  349,  333],
         [1304,  789,   10,   77,   98,  810,  177,  649,   77,  617]]),
 tensor([10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
         10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]))

batch.LABEL = tensor([2, 1, 2, 2, 0, 0, 2, 2, 1, 0, 2, 2, 2, 0, 0, 1, 2, 0, 0, 0, 0, 0, 2, 2,
        0, 0, 0, 1, 1, 0, 0, 0]) #shape = 32

The first index implies the sentences with tokens, while the second index is a list with the total length of sentences per batch (i.e. 10 = 10 words per sentence in the 32 sentences of the batch).

My neural network

import torch.nn as nn

class MultiClassClassifer(nn.Module):
  #define all the layers used in model
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
    
    #Constructor
    super(MultiClassClassifer, self).__init__()

    #embedding layer
    padding_idx = TEXT.vocab['<pad>']
    self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=padding_idx)
    self.embedding.weight.requires_grad = True

    #output layer
    self.output = nn.Linear(embedding_dim, output_dim)

    #activation layer
    self.act = nn.Softmax(dim=1) #2d-tensor

    #initialize weights of embedding layer
    self.init_weights()

  def init_weights(self):

    initrange = 1.0
    
    self.embedding.weight.data.uniform_(-initrange, initrange)
  
  def forward(self, text, text_lengths):

    embedded = self.embedding(text)
    embedded = torch.mean(embedded, dim=1, keepdim=True)
    print(embedded.shape)

    output = self.act(self.output(embedded))
    print(output.shape)

    return output

During the calculation of the CrossEntropyLoss():

loss = nn.CrossEntropyLoss(predictions, batch.label)

I get the following error:

RuntimeError: Expected target size [32, 3], got [32]

I am sure this is happening because predictions are of shape [32,1,3] while batch.label is of shape [32].
How can I resolve this to adjust my multi-class classification problem? Is this a data load problem? or is it with the torch linear layer.

I have published also my colab notebook in case anyone wants to take a look on my data and the nn.

One possible solution I could think of is to transform the softmax output from shape [32,3] to [32], so it will match the shape of the labels. So, from the first set of probabilities [0.2, 0.3, 0.5], I could replace the maximum index (0.5) with the label 2 to imply that this sentence belongs to the third label. But since I am not familiar with pytorch I am not sure if this is the correct approach.