LSTM: if output size is differ from label size

zhidali · July 25, 2017, 9:01am

Hi I am new to Pytorch, appreciated if you can help
I do two clsss classification.
My train.size() is (100L, 37L), labels.size() is (100L,),
but when I reshape to x, x will be (5L, 20L, 37L)(I set sequence_length=20),
then outputs.size() is (20L, 2L) , y.size() is (100L,).
If I set sequence_length=1, I can run it but there is no meaning to do it.
How can I reshape the y so the y(label) can coresspoding to output?
Thank you very much.

//Train the Model

for epoch in range(num_epochs):

for i, (train, labels) in enumerate(train_loader):                   
    x = Variable(train.view(-1, sequence_length, input_size))        
    y = Variable(labels)                                      
    outputs = rnn(x)                                                
    loss = criterion(outputs, y)                                    
    optimizer.zero_grad()                                           
    loss.backward()                                               
    optimizer.step()                                                
    if (i + 1) % 20 == 0:
        print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
               % (epoch + 1, num_epochs, i + 1, len(trainDataset) // batch_size, loss.data[0]))

timbmg · July 25, 2017, 9:05am

You can process the entire sequence and then only use the last output for classification.

zhidali · July 25, 2017, 9:33am

Thanks for the reply.
But I want to obtain the all ouputs for this entire sequence,
every batch I have sequence_length=20, but the shape of target now is 100*1,
targets are not in any batch, could you explain more that will be help me a lot. Thanks.

timbmg · July 25, 2017, 12:22pm

The input to an LSTM is of shape [Sequence Length, Batch Size, Features]. Therefore, Sequence Length is not a hyper-parameter per se (and you should not just set it to a value), but depends on your data.

Your LSTM output will have the shape [Sequence Length, Batch Size, Hidden Size * Num Directions]. Depending on your task, you want to have an additional layer in your model mapping the Hidden Size * Num Directions of each batch of each sequence to your desired output size. For example if you would do binary classification, you would want an a Linear layer in your model Linear(Hidden Size * Num Directions, 1). And you would pass the outuput of each per time step to that layer. Then your forward pass should return something of the shape[Sequence Length, Batch Size].

I think this excellent example will also give you some idea: https://github.com/spro/practical-pytorch/blob/master/char-rnn-classification/char-rnn-classification.ipynb

zhidali · July 25, 2017, 8:29pm

Thanks for the information the help timbmg, that’s a good site!!