Attempting to understand LSTM'S input

I’m trying to figure out how PyTorch LSTM takes input. I’ve read the documentation, but I’d like someone more experienced to confirm or correct what I’ve gathered so far.

Initially, let’s establish notation in accordance with the documentation.

  1. N = Batch Size
  2. L = Sequence Length
  3. H-IN = input_size where input_size is defined as

The number of expected features in the input x

where x is defined as the input at time t if I’ve understood correctly.

I’ll provide 2 examples.

Example - 1:

Say I have 3 instances/sequences per batch so N = 3 and each instance/sequence is represented as [X, Y] where X and Y are numbers so L = 2 and both X and Y correspond to the first and second timestep respectively. Each instance here could for example be the numerical representation of a sentence.

Therefore the correct way to do it for 1 layer and a hidden state of 4 would be like this:

batch_tensor = torch.tensor([
    # The first sequence [1, 2] of length 2 where 1 is the first timestep and 2 is the second timestep
    [[1], [2]],

    # The second sequence [4, 5] of length 2 where 4 is the first timestep and 5 is the second timestep    
    [[4], [5]],  

    # The third sequence [7, 8] of length 2 where 7 is the first timestep and 8 is the second timestep    
    [[7], [8]]  
], dtype=torch.float32)

print(batch_tensor.shape)

# Outputs -> torch.Size([3, 2, 1])

# input_size should be 1 as each timestep has dimensionality of 1
lstm = nn.LSTM(input_size=1, hidden_size=4, num_layers=1, batch_first=True)

Example - 2:

This time I have 3 instances/sequences per batch so N = 3 and each instance/sequence is represented as [X, Y] where X and Y are vectors this time so L = 2 and both X and Y correspond to the first and second timestep respectively. Each instance here could for example be the representation of a 2-words sentence where X is the word embedding for the first word and Y is the word embedding for the second word.

Therefore the correct way to do it for 1 layer and a hidden state of 4 would be like this:

batch_tensor = torch.tensor([
    # The first sequence [[1, 1.5], [2, 2.5]] of length 2 where [1, 1.5] is the first timestep and [2, 2.5] is the second timestep
    [[1, 1.5], [2, 2.5]],

    # The second sequence [[4, 4.5], [5, 5.5]] of length 2 where [4, 4.5] is the first timestep and [5, 5.5] is the second timestep
    [[4, 4.5], [5, 5.5]],

    # The third sequence [[7, 7.5], [8, 8.5]] of length 2 where [7, 7.5] is the first timestep and [8, 8.5] is the second timestep
    [[7, 7.5], [8, 8.5]]
], dtype=torch.float32)

print(batch_tensor.shape)

# Outputs -> torch.Size([3, 2, 2])

# input_size should be 2 as each timestep has dimensionality of 2
lstm = nn.LSTM(input_size=2, hidden_size=4, num_layers=1, batch_first=True)

Questions:

  1. Are my examples correct?
  2. In both examples after playing around to see if something would break, I decided to set the input_size to 2000 but it still worked and I wonder why that is.
  1. From what I can tell, you are using it correctly.
  2. Why would a larger input size break something? In the simplest case of an RNN at one time step, if we took an input size of 2,000 and matmuled that by weights of size (2,000, 4), we would have an output size of 4. Nothing unusual there.