LSTM Output Transposed

PBAL · June 22, 2022, 4:53pm

I run PyTorch 1.10 in production using an LSTM model. Just tested 1.13.0.dev20220620 nightly build on a MacBook Pro M1 Max and the LSTM model output is reversing the order:

Model IN: [batch, seq, input]
Model OUT: [seq, batch, output]

Model OUT should be [batch, seq, output]. The issue occurs in 1.13 whether the device is CPU or MPS.

I have made no modifications to my production code that runs on 1.10 and outputs correctly.

I am new to the forum and excuse me if this issue is already known and I realize this is not a stable release yet. I just wanted to highlight it in case it is helpful in improving the release.

ptrblck · June 24, 2022, 5:44am

I cannot reproduce the issue using a current nightly release and get:

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False)
x = torch.randn(seq_len, batch_size, input_size)

out, (h, c) = model(x)
print(out.shape)
# torch.Size([7, 2, 5]) == [seq_len, batch_size, hidden_size]
print(h.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size] # D = 1 since bidirectional=False
print(c.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size]

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True)
x = torch.randn(batch_size, seq_len, input_size)

out, (h, c) = model(x)
print(out.shape)
# torch.Size([2, 7 5]) == [batch_size, seq_len, hidden_size]
print(h.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size] # D = 1 since bidirectional=False
print(c.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size]

Are you using the default batch_first=False setup or are you changing it to True?

EDIT: I don’t have an M1, so could you execute the code on your laptop and post the outputs here, please?

PBAL · June 25, 2022, 5:44pm

Thanks, Patrick. I edited your code to make device cpu or mps and post below the output from each on my MBP M1 Max. You can see that it reverses the order in one case: using mps with batch_first=True. If you use mps with batch_first=False it is the correct order.

device = torch.device("cpu")
print(device)

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False).to(device)
x = torch.randn(seq_len, batch_size, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True).to(device)
x = torch.randn(batch_size, seq_len, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

cpu
torch.Size([7, 2, 10])
torch.Size([7, 2, 5])
torch.Size([2, 7, 10])
torch.Size([2, 7, 5])

--------------------------------------------------------
device = torch.device("mps")
print(device)

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False).to(device)
x = torch.randn(seq_len, batch_size, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True).to(device)
x = torch.randn(batch_size, seq_len, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

mps
torch.Size([7, 2, 10])
torch.Size([7, 2, 5])
torch.Size([2, 7, 10])
torch.Size([7, 2, 5])

ptrblck · June 26, 2022, 12:47am

Thanks for the update! As it looks indeed wrong for this platform, could you create a GitHub issue and reference this thread there, please?

PBAL · June 26, 2022, 7:33pm

Done:

github.com/pytorch/pytorch

LSTM Output Transposed w/MPS on 1.13 nightly build

opened 07:33PM - 26 Jun 22 UTC

PHRABAL

### 🐛 Describe the bug The 1.13 nightly build, when sending an LSTM model to `d…evice="mps"` reverses the expected order of batch and seq in the output. Please see this discussion for code examples and further details: https://discuss.pytorch.org/t/lstm-output-transposed/154820/2 ### Versions Collecting environment information... PyTorch version: 1.13.0.dev20220620 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 12.4 (arm64) GCC version: Could not collect Clang version: 13.1.6 (clang-1316.0.21.2.5) CMake version: Could not collect Libc version: N/A Python version: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:01:00) [Clang 13.0.1 ] (64-bit runtime) Python platform: macOS-12.4-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.13.0.dev20220620 [conda] numpy 1.22.4 pypi_0 pypi [conda] pytorch 1.13.0.dev20220620 py3.9_0 pytorch-nightly

AngledLuffa · December 7, 2022, 6:28am

Unfortunately, the linked issue was closed, but it still exists for certain LSTMs. I found it in a bidirectional LSTM with two layers, at least. Here is a minimal example in which I would expect the results to be identical, but in fact it is not:

import torch

torch.manual_seed(1234)
lstm = torch.nn.LSTM(5, 5, num_layers=2, bidirectional=True, batch_first=True, dropout=0.5)
inp = torch.randn(1, 4, 5)
print(torch.linalg.norm(lstm(inp)[0]).item())
print(torch.linalg.norm(lstm.to("mps")(inp.to("mps"))[0]).item())

PBAL · December 7, 2022, 4:30pm

Oh, it’s definitely still broken. I got the sense that this is not a priority to address, so I just dropped the issue.

AngledLuffa · December 7, 2022, 5:14pm

I was hoping the issue might be the batch_first, since then we could at least transpose all of our tensors, but setting batch_first to False is still bugged:

import torch

torch.manual_seed(1234)
lstm = torch.nn.LSTM(5, 5, num_layers=2, bidirectional=True, batch_first=False, dropout=0.5)
inp = torch.randn(4, 1, 5)
print(torch.linalg.norm(lstm(inp)[0]).item())
print(torch.linalg.norm(lstm.to("mps")(inp.to("mps"))[0]).item())

Unfortunately, our NLP package at Stanford uses bi-lstms extensively, which makes the MPS backend unusable until this is fixed.

PBAL · December 7, 2022, 5:27pm

I don’t use bidirectional and my LSTM models crash using mps. Best to stick with GPUs. I don’t think there will be a lot of progress on Mac mps in the near term.