LSTM Output Transposed

I run PyTorch 1.10 in production using an LSTM model. Just tested 1.13.0.dev20220620 nightly build on a MacBook Pro M1 Max and the LSTM model output is reversing the order:

Model IN: [batch, seq, input]
Model OUT: [seq, batch, output]

Model OUT should be [batch, seq, output]. The issue occurs in 1.13 whether the device is CPU or MPS.

I have made no modifications to my production code that runs on 1.10 and outputs correctly.

I am new to the forum and excuse me if this issue is already known and I realize this is not a stable release yet. I just wanted to highlight it in case it is helpful in improving the release.

1 Like

I cannot reproduce the issue using a current nightly release and get:

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False)
x = torch.randn(seq_len, batch_size, input_size)

out, (h, c) = model(x)
print(out.shape)
# torch.Size([7, 2, 5]) == [seq_len, batch_size, hidden_size]
print(h.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size] # D = 1 since bidirectional=False
print(c.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size]

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True)
x = torch.randn(batch_size, seq_len, input_size)

out, (h, c) = model(x)
print(out.shape)
# torch.Size([2, 7 5]) == [batch_size, seq_len, hidden_size]
print(h.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size] # D = 1 since bidirectional=False
print(c.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size]

Are you using the default batch_first=False setup or are you changing it to True?

EDIT: I don’t have an M1, so could you execute the code on your laptop and post the outputs here, please?

Thanks, Patrick. I edited your code to make device cpu or mps and post below the output from each on my MBP M1 Max. You can see that it reverses the order in one case: using mps with batch_first=True. If you use mps with batch_first=False it is the correct order.

device = torch.device("cpu")
print(device)

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False).to(device)
x = torch.randn(seq_len, batch_size, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True).to(device)
x = torch.randn(batch_size, seq_len, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

cpu
torch.Size([7, 2, 10])
torch.Size([7, 2, 5])
torch.Size([2, 7, 10])
torch.Size([2, 7, 5])

--------------------------------------------------------
device = torch.device("mps")
print(device)

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False).to(device)
x = torch.randn(seq_len, batch_size, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True).to(device)
x = torch.randn(batch_size, seq_len, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

mps
torch.Size([7, 2, 10])
torch.Size([7, 2, 5])
torch.Size([2, 7, 10])
torch.Size([7, 2, 5])

Thanks for the update! As it looks indeed wrong for this platform, could you create a GitHub issue and reference this thread there, please?

Done:

1 Like

Unfortunately, the linked issue was closed, but it still exists for certain LSTMs. I found it in a bidirectional LSTM with two layers, at least. Here is a minimal example in which I would expect the results to be identical, but in fact it is not:

import torch

torch.manual_seed(1234)
lstm = torch.nn.LSTM(5, 5, num_layers=2, bidirectional=True, batch_first=True, dropout=0.5)
inp = torch.randn(1, 4, 5)
print(torch.linalg.norm(lstm(inp)[0]).item())
print(torch.linalg.norm(lstm.to("mps")(inp.to("mps"))[0]).item())

Oh, it’s definitely still broken. I got the sense that this is not a priority to address, so I just dropped the issue.

I was hoping the issue might be the batch_first, since then we could at least transpose all of our tensors, but setting batch_first to False is still bugged:

import torch

torch.manual_seed(1234)
lstm = torch.nn.LSTM(5, 5, num_layers=2, bidirectional=True, batch_first=False, dropout=0.5)
inp = torch.randn(4, 1, 5)
print(torch.linalg.norm(lstm(inp)[0]).item())
print(torch.linalg.norm(lstm.to("mps")(inp.to("mps"))[0]).item())

Unfortunately, our NLP package at Stanford uses bi-lstms extensively, which makes the MPS backend unusable until this is fixed.

I don’t use bidirectional and my LSTM models crash using mps. Best to stick with GPUs. I don’t think there will be a lot of progress on Mac mps in the near term.