Using the code below produces the correct output shape as described by LSTM functionality with batch_first=True
.
lstm = nn.LSTM(
input_size=8,
hidden_size=512,
batch_first=True,
num_layers=2)
x = torch.randn(4, 80, 8)
h0 = torch.randn(2, 4, 512)
c0 = torch.randn(2, 4, 512)
lstm_out, hidden = lstm(x, (h0, c0))
print(lstm_out.shape)
yields
torch.Size([4, 80, 512]).
Simply changing the device breaks the functionality by swapping the dimensionality of batch size and sequence length. See below:
lstm = nn.LSTM(
input_size=8,
hidden_size=512,
batch_first=True,
num_layers=2).to(‘mps’)
x = torch.randn(4, 80, 8).to(‘mps’)
h0 = torch.randn(2, 4, 512).to(‘mps’)
c0 = torch.randn(2, 4, 512).to(‘mps’)
lstm_out, hidden = lstm(x, (h0, c0))
print(lstm_out.shape)
yields
torch.Size([80, 4, 512]).
Is this the known error? I ran the code with PyTorch 1.12 installed through pip
on MacBook Pro with Apple M1 Pro chip and MacOS Monterey 12.4.