Issue with LSTM source code

I am using bidirectional LSTM with batach_first=True. However, it is throwing me an error regarding dimensions.

Expected hidden[0] size (6, 5, 40), got (5, 6, 40)

When I checked the source code, the error occurred due to below function

 if is_input_packed:
            mini_batch = int(batch_sizes[0])
            mini_batch = input.size(0) if self.batch_first else input.size(1)

        num_directions = 2 if self.bidirectional else 1
        expected_hidden_size = (self.num_layers * num_directions,
                                mini_batch, self.hidden_size)

        def check_hidden_size(hx, expected_hidden_size, msg='Expected hidden size {}, got {}'):
            if tuple(hx.size()) != expected_hidden_size:
                raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))

By default expected_hidden_size is written with respect to sequence first. I believe it is causing the problem. Can someone advise if I am right and the issue needs to be fixed?

There isn’t a sequence in the hidden size.
So specifying batch_first does not affect the way you pass in the hidden state, and you would have to transpose that yourself.
While I see how this might not be aligned with the expectation, I’m not sure that people will be keen to change the input specification, given that it has been this way for quite a while.

Best regards


1 Like

Yeah transposing it helped. I guess if I use batch_first=True while creating LSTM object, it automatically transposes the h0 and c0 before performing tensor operations.