Categorical LSTM confused with shape of data and batching

Tank · June 20, 2020, 7:57pm

Creating a part of speech LSTM. Not sure how to shape the data if I am batching sentences of similar length.

X.shape = (100, 60, 4), [batch, length of sentence, features per word]
output.shape = (100, 60, 10) [batch, length of sentences, type of word (10 potential types)]
y.shape = (100, 60)

The error I get: Expected target size (100, 10), got torch.Size([100, 60])

Below the code of my network:

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.lstm1 = nn.GRU(input_size = 4, hidden_size = 32, bidirectional= True)
        self.fcn1 = nn.Linear(64, 512)
        self.fcn2 = nn.Linear(512, 512)
        self.fcn3 = nn.Linear(512, 10)
        
        self.softmax = nn.LogSoftmax(dim=2)

    def forward(self, x):
        x, _ = self.lstm1(x)
        x = self.fcn1(x)
        x = self.fcn2(x)
        x = self.fcn3(x)
        x =x.squeeze(1)
  
       
        return x


net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)

for epoch in range(5000):
    
    optimizer.zero_grad()
    
    running_loss = 0.0
    outputs = net(X)
    print(outputs.shape, y.shape)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
    running_loss += loss.item()
        
    if epoch % 100 == 0: 
        print("Epoch: ", epoch, "Loss: " , running_loss)

Thanks for the help !

ptrblck · June 21, 2020, 6:11am

nn.GRU expects the input in the shape [seq_len, batch_size, input_features] by default.
You could use batch_first=True to pass the inputs as [batch_size, seq_len, input_features], which seems to match your current input shape.
The output will have the shape [batch_size, seq_len, hidden*num_directions].
The following linear layers will apply their operations on dim1 “in a loop”, i.e. the linear transformation will be applied on each sample in the seq_len dimension.
The last x.squeeze(1) won’t have any effect, as the temporal dimension will stay at 60 in dim1.

Also, you don’t have any non-linearities between the linear layers, so you might want to add them.

Tank · June 21, 2020, 1:01pm

Thanks for the reply. I also had to change the loss function. Cross Entropy does not seem to work for many-to-many LSTM. Instead I am using

criterion = nn.MSELoss()

shape of output = [100, 60, 10 ] and shape of y = [100, 60, 10] where the 10 is one hot encoded. I can’t get cross entropy criterion to work on this form of data.

ptrblck · June 21, 2020, 10:17pm

What is your number of classes and what do the dimensions of the output represent?
Are you working on a multi-class classification, where each temp. step would correspond to a single class?
If so, nn.CrossEntropyLoss expects the model output to have the shape [batch_size, nb_classes, seq_len] and the target as [batch_size, seq_len] containing the class indices in the range [0, nb_classes-1].

Tank · June 23, 2020, 9:37pm

I am working on a multi-class classification where each t_i needs to be classified.

Not sure how i would go about transforming output:

current output is [number of batches, sequence length, number of classes]
target is correct [number of batches, sequence length]

harsha_g · June 23, 2020, 9:56pm

output = output.permute(0, 2, 1)

Tank · June 23, 2020, 10:36pm

Thank you! It worked!!