Mismatch in output size for LSTMs by varying device

Using the code below produces the correct output shape as described by LSTM functionality with batch_first=True.

lstm = nn.LSTM(
input_size=8,
hidden_size=512,
batch_first=True,
num_layers=2)
x = torch.randn(4, 80, 8)
h0 = torch.randn(2, 4, 512)
c0 = torch.randn(2, 4, 512)
lstm_out, hidden = lstm(x, (h0, c0))
print(lstm_out.shape)

yields

torch.Size([4, 80, 512]).

Simply changing the device breaks the functionality by swapping the dimensionality of batch size and sequence length. See below:

lstm = nn.LSTM(
input_size=8,
hidden_size=512,
batch_first=True,
num_layers=2).to(‘mps’)
x = torch.randn(4, 80, 8).to(‘mps’)
h0 = torch.randn(2, 4, 512).to(‘mps’)
c0 = torch.randn(2, 4, 512).to(‘mps’)
lstm_out, hidden = lstm(x, (h0, c0))
print(lstm_out.shape)

yields

torch.Size([80, 4, 512]).

Is this the known error? I ran the code with PyTorch 1.12 installed through pip on MacBook Pro with Apple M1 Pro chip and MacOS Monterey 12.4.

MPS has many issues as it is recently developed.
I don’t think this is the known error, @ptrblck

1 Like

Seems to be related to this issue.

The nightly build of 1.13 has now fixed this issue. However, note, mps currently does not support LSTM backprop in any case. You can only infer.

1 Like