I have a training_iterator that looks like:
batch.TEXT = (tensor([[ 0, 3, 31, 2104, 0, 102, 847, 538, 23, 974],
[ 0, 24, 577, 4782, 16, 1191, 296, 385, 396, 0],
[ 21, 4, 8, 5, 214, 3549, 2079, 0, 982, 2246],
[ 116, 346, 12, 30, 89, 38, 45, 491, 5, 7],
[ 3, 69, 56, 1372, 1704, 124, 3906, 530, 81, 0],
[ 122, 97, 771, 10, 67, 1986, 945, 236, 0, 20],
[ 3, 17, 423, 35, 4, 0, 387, 285, 251, 37],
[1524, 322, 4511, 28, 30, 12, 1199, 288, 1129, 3],
[ 398, 2070, 2646, 113, 201, 2, 1748, 20, 72, 1525],
[ 62, 301, 929, 2, 1149, 2092, 524, 20, 286, 2425],
[ 102, 1722, 1865, 123, 3541, 8, 163, 2, 54, 2688],
[ 762, 0, 11, 2367, 276, 0, 10, 4082, 9, 182],
[2966, 434, 2187, 704, 2247, 16, 0, 0, 0, 9],
[ 543, 699, 543, 699, 326, 699, 692, 0, 16, 9],
[1790, 391, 3, 690, 18, 76, 4, 264, 499, 3703],
[ 3, 17, 32, 776, 71, 92, 158, 1818, 5, 7],
[ 416, 11, 811, 194, 1034, 0, 642, 2010, 4232, 4232],
[ 164, 178, 91, 1048, 279, 0, 1886, 748, 162, 2],
[ 182, 20, 443, 697, 55, 3742, 229, 28, 1793, 1586],
[2526, 4904, 2238, 3, 28, 30, 0, 311, 729, 0],
[ 2, 188, 616, 192, 697, 1493, 2161, 3, 394, 48],
[ 701, 3, 17, 2, 823, 168, 130, 1165, 1554, 3],
[ 684, 373, 721, 374, 366, 8, 649, 77, 214, 28],
[ 73, 232, 221, 309, 3, 73, 232, 221, 309, 3],
[3875, 141, 638, 0, 2788, 0, 387, 20, 159, 2],
[ 4, 1261, 2, 425, 1166, 39, 49, 13, 1870, 0],
[ 3, 17, 15, 1698, 60, 1084, 1235, 368, 122, 30],
[ 81, 450, 388, 11, 211, 246, 1022, 1601, 548, 4300],
[ 0, 396, 38, 23, 4, 1251, 0, 193, 2721, 16],
[ 785, 5, 140, 8, 2688, 2, 52, 107, 5, 7],
[1574, 1430, 11, 44, 211, 246, 548, 3712, 349, 333],
[1304, 789, 10, 77, 98, 810, 177, 649, 77, 617]]),
tensor([10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]))
batch.LABEL = tensor([2, 1, 2, 2, 0, 0, 2, 2, 1, 0, 2, 2, 2, 0, 0, 1, 2, 0, 0, 0, 0, 0, 2, 2,
0, 0, 0, 1, 1, 0, 0, 0]) #shape = 32
The first index implies the sentences with tokens, while the second index is a list with the total length of sentences per batch (i.e. 10 = 10 words per sentence in the 32 sentences of the batch).
My neural network
import torch.nn as nn
class MultiClassClassifer(nn.Module):
#define all the layers used in model
def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
#Constructor
super(MultiClassClassifer, self).__init__()
#embedding layer
padding_idx = TEXT.vocab['<pad>']
self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=padding_idx)
self.embedding.weight.requires_grad = True
#output layer
self.output = nn.Linear(embedding_dim, output_dim)
#activation layer
self.act = nn.Softmax(dim=1) #2d-tensor
#initialize weights of embedding layer
self.init_weights()
def init_weights(self):
initrange = 1.0
self.embedding.weight.data.uniform_(-initrange, initrange)
def forward(self, text, text_lengths):
embedded = self.embedding(text)
embedded = torch.mean(embedded, dim=1, keepdim=True)
print(embedded.shape)
output = self.act(self.output(embedded))
print(output.shape)
return output
During the calculation of the CrossEntropyLoss():
loss = nn.CrossEntropyLoss(predictions, batch.label)
I get the following error:
RuntimeError: Expected target size [32, 3], got [32]
I am sure this is happening because predictions
are of shape [32,1,3] while batch.label
is of shape [32].
How can I resolve this to adjust my multi-class classification problem? Is this a data load problem? or is it with the torch linear layer.
I have published also my colab notebook in case anyone wants to take a look on my data and the nn.
One possible solution I could think of is to transform the softmax output from shape [32,3] to [32], so it will match the shape of the labels. So, from the first set of probabilities [0.2, 0.3, 0.5], I could replace the maximum index (0.5) with the label 2 to imply that this sentence belongs to the third label. But since I am not familiar with pytorch I am not sure if this is the correct approach.