Mismatch in output size for LSTMs by varying device

Samuel_Horvath · July 19, 2022, 6:31pm

Using the code below produces the correct output shape as described by LSTM functionality with batch_first=True.

lstm = nn.LSTM(
input_size=8,
hidden_size=512,
batch_first=True,
num_layers=2)
x = torch.randn(4, 80, 8)
h0 = torch.randn(2, 4, 512)
c0 = torch.randn(2, 4, 512)
lstm_out, hidden = lstm(x, (h0, c0))
print(lstm_out.shape)

yields

torch.Size([4, 80, 512]).

Simply changing the device breaks the functionality by swapping the dimensionality of batch size and sequence length. See below:

lstm = nn.LSTM(
input_size=8,
hidden_size=512,
batch_first=True,
num_layers=2).to(‘mps’)
x = torch.randn(4, 80, 8).to(‘mps’)
h0 = torch.randn(2, 4, 512).to(‘mps’)
c0 = torch.randn(2, 4, 512).to(‘mps’)
lstm_out, hidden = lstm(x, (h0, c0))
print(lstm_out.shape)

yields

torch.Size([80, 4, 512]).

Is this the known error? I ran the code with PyTorch 1.12 installed through pip on MacBook Pro with Apple M1 Pro chip and MacOS Monterey 12.4.

thecho7 · July 20, 2022, 1:37am

MPS has many issues as it is recently developed.
I don’t think this is the known error, @ptrblck

ptrblck · July 20, 2022, 4:47am

Seems to be related to this issue.

PBAL · July 20, 2022, 5:19am

The nightly build of 1.13 has now fixed this issue. However, note, mps currently does not support LSTM backprop in any case. You can only infer.