Sequence multiclass classification

JustAGuysInThailand · April 8, 2021, 8:20pm

I have a dataset in the size of [88,498,20] which represent 88 samples with the length of 498, each Time-steps will be represent by 20 classes.
My output is [88,498,3] so it’s the same as input only different is now I only have 3 classes to predict.

So this is my first time with Pytorch. With Keras, I just simply create Dense layers , using Categorical_CrossEntropy loss function.

But for Pytorch I tried to create like this :

class MLP(torch.nn.Module):
def init(self):
super(MLP, self).init()
self.dense1 = torch.nn.Linear(20,128)
self.relu = torch.nn.ReLU()
self.dropout = nn.Dropout(0.5)
self.dense2 = torch.nn.Linear(128, 3)
self.softmax = torch.nn.Softmax()
def forward(self, x):
    x = self.dense1(x)
    x = self.relu(x)
    x = self.dense2(x)
    x = self.dropout(x)
    x = self.softmax(x)
    return x

and the training go like this:

model = MLP()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
train_losses =
val_losses =

for epoch in range(100):
running_loss = 0.0
optimizer.zero_grad()
outputs = model(x_train)
loss = criterion(outputs, y_train)
loss.backward()
optimizer.step()

But I got this error with the Loss function :

ValueError: Expected target size (88, 3), got torch.Size([88, 498, 3])

So I don’t know what should I configure with the network and how to determine the size of each Linear layers.

ptrblck · April 9, 2021, 7:14am

nn.CrossEntropyLoss expects raw logits, so you should remove the self.softmax from the forward method.
Assuming your current model output represents: [batch_size=88, seq_len=498, nb_classes=3], you would have to permute the output so that it has the shape [batch_size, nb_classes, seq_len] via output = output.permute(0, 2, 1).
Also the target shape should be [batch_size, seq_len] and it should contain class indices in the range [0, nb_classes-1].

JustAGuysInThailand · April 9, 2021, 7:51am

Thank you for your answer.
But here I don’t really understand.
First about the input : My first layer is

nn.Linear(input_size, hidden_size)

so what is the input size here? Normally in Keras i just put input_size = (batch, seq_len, no_classes) but for Pytorch I have tried and the only satisfied value for input_size is 20 which is my no_classes? So normally with Pytorch we should pout as no_layers for input size?

Second one, I understand the permute() you so me so to fit with the loss function, I think I have to make the target_size as [batch_size, seq_len] so that mean I will not need to convert my level to one-hot encoding as Keras I believe. But can you so me how to determine the number of sample for each batch? With Keras I just declare the batch_size = no_samples per batch only.

Sorry for the newbie question, I’m really new with Pytorch , just move from Keras to this framework so my idea just cannot follow Pytorch properly and tutorial on the internet is not as detail as Keras ( I think because Pytorch is still new ).

ptrblck · April 10, 2021, 12:27am

The input size to an nn.Linear layer is defined in the docs as:

Input: (N,∗,H_in) where *∗ means any number of additional dimensions and H_in=in_features

In your case you would thus pass the input as [batch_size, seq_len, in_features].

Are you sure you are passing the no_classes into the first layer in Keras? This seems like an unneeded limitation unless your model only contains a single layer.

Yes, the shape is correct and also one-hot encoded targets are not needed, but the target should contain the class indices.

The batch size is not determined in the model (as it’s not depending on the number of samples passed to it) and you could set the desired batch_size in the DataLoader.