How to extract the last output of LSTMs

I have some training text data in variable lengths. I first feed that in an char-based Embedding, then padding using pack_padded_sequence, feeding in LSTM, and finally unpacking with pad_packed_sequence.

At this moment, I have a Variable of BATCH_SIZE*PAD_LENGTH*EMBEDDING_LEN and another Variable of the real length of each piece of data points. In this way, if I want the last output of each output for further steps and backprop (that said, I want to skip those with 0s as input due to padding), what is the best way to do so? Thanks!

Hi,

you can just use seq[:, lenghts-1] (if your lengths are at least 1). That gives you the BATCH_SIZE*EMBEDDING tensor of last elements.

Best regards

Thomas

Thanks for the reply! As mentioned above, I got 64*20*1024 (i.e., data) for result and 64 (i.e., lens) for a variable that contains lengths (all <= 20 in my case). When I did data[:, lens - 1], I got a variable of 64*64*1024. Was I doing something bad?

I’m pretty sure it’s batch first, i.e., 64 is the batch size. 20 is the maximum padding length and 1024 is the embedding. Any thoughts?:sweat_smile:

A data piece that might be helpful:
data:

Variable containing:
( 0  ,.,.) = 
  0.0417 -0.0041 -0.0519  ...  -0.0086  0.0407 -0.0095
  0.0733 -0.0780 -0.0431  ...  -0.0061  0.0458  0.0193
  0.0938 -0.0610 -0.0333  ...  -0.0049  0.0135  0.0047
           ...             ⋱             ...          
  0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000

( 1  ,.,.) = 
  0.0657 -0.0431  0.0004  ...  -0.0016  0.0423 -0.0068
  0.0916 -0.0649  0.0253  ...   0.0122  0.0777 -0.0116
  0.1211 -0.0619  0.0082  ...   0.0020  0.0683  0.0131
           ...             ⋱             ...          
  0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
...
[torch.cuda.FloatTensor of size 64x20x1024 (GPU 0)]

lens:

Variable containing:
 17
 13
  7
 13
 16
 11
  6
  3
  8
  6
...
[torch.cuda.LongTensor of size 64 (GPU 0)]

Hi,

sorry, I thought I had tested it, but I actually seem to have done something different. So this might work:
data[torch.arange(64, out=torch.LongTensor()), lens - 1]

Best regards

Thomas

That works (after adding .cuda()). Thanks man!

BTW, will this affect the autograd? I am not sure whether the Variable will maintain the same.

Hi Thomas,

I just wanna get back that actually the code still gives a result of Batch_size x Batch_size x Emb_len (64x64x1024). Have you tried to run the code somehow?

Thanks!

Get it wrong once, never recover. :frowning:
But here is what I actually have tested:

data = torch.randn(64,20,1024)
last_elements = torch.randint(0, 20, (64,)).long()
myrange = torch.arange(0,64, dtype=torch.int64)
print(data[myrange, last_elements])

Best regards

Thomas

1 Like

Eh… I’m not sure whether I did something bad but on my end it appears that there is no torch.int64. Could you have a double check?

Use .long() after range instead and revisit when Pytorch 0.4 is here. :slight_smile:

I think the current pytorch version is 0.3 something?