LSTM with same inputs in a batch

sahilshah · May 26, 2017, 1:16pm

Hi,

I am new to PyTorch. I am currently using a simple lstm encoder for my variable length sequences. I notice something strange. When I feed the same inputs in a batch, the lstm outputs for each identical input is different. That is, say my variable length sequence all begin with the input 1, the hidden state after the first step is different for all of them.

What am I doing wrong ? (I am definitely sure that it is the self.hidden that I am passing which creates this issue)

pack_wv = torch.nn.utils.rnn.pack_padded_sequence(batch_in_wv_sorted, seq_lengths_sorted, batch_first=True)
out, (ht,ct) = self.lstm(pack_wv, self.hidden)

When I change the last line to
out, (ht,ct) = self.lstm(pack_wv)

The issue goes away. But I am afraid that this will lead to incorrect behavior, since we are expected to pass the self.hidden for the intialised first hidden state.

spro · May 26, 2017, 5:55pm

Can you show how the LSTM and hidden state are initialized?

sahilshah · May 26, 2017, 6:42pm

self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True)
self.hidden = (autograd.Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_dim)),autograd.Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_dim)))

sahilshah · May 26, 2017, 8:46pm

If I don’t do random, the issue goes away.

spro · May 27, 2017, 6:12pm

That makes sense, if you’re initializing a random hidden state, making it random between batches, you’re gonna get different outputs for the same input.

If you were trying to make them the same you could initialize a zero hidden state (as leaving it None does) or a n_layers x 1 x hidden_dim random state that you repeat batch_size times over dimension 1. However I can’t imagine why that would be useful… if you’re building a seq2seq type model the states will be different between batches.