LSTM Output Transposed

I run PyTorch 1.10 in production using an LSTM model. Just tested 1.13.0.dev20220620 nightly build on a MacBook Pro M1 Max and the LSTM model output is reversing the order:

Model IN: [batch, seq, input]
Model OUT: [seq, batch, output]

Model OUT should be [batch, seq, output]. The issue occurs in 1.13 whether the device is CPU or MPS.

I have made no modifications to my production code that runs on 1.10 and outputs correctly.

I am new to the forum and excuse me if this issue is already known and I realize this is not a stable release yet. I just wanted to highlight it in case it is helpful in improving the release.

1 Like

I cannot reproduce the issue using a current nightly release and get:

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False)
x = torch.randn(seq_len, batch_size, input_size)

out, (h, c) = model(x)
print(out.shape)
# torch.Size([7, 2, 5]) == [seq_len, batch_size, hidden_size]
print(h.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size] # D = 1 since bidirectional=False
print(c.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size]

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True)
x = torch.randn(batch_size, seq_len, input_size)

out, (h, c) = model(x)
print(out.shape)
# torch.Size([2, 7 5]) == [batch_size, seq_len, hidden_size]
print(h.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size] # D = 1 since bidirectional=False
print(c.shape)
# torch.Size([1, 2, 5]) == [D*num_layers , batch_size, hidden_size]

Are you using the default batch_first=False setup or are you changing it to True?

EDIT: I don’t have an M1, so could you execute the code on your laptop and post the outputs here, please?

Thanks, Patrick. I edited your code to make device cpu or mps and post below the output from each on my MBP M1 Max. You can see that it reverses the order in one case: using mps with batch_first=True. If you use mps with batch_first=False it is the correct order.

device = torch.device("cpu")
print(device)

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False).to(device)
x = torch.randn(seq_len, batch_size, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True).to(device)
x = torch.randn(batch_size, seq_len, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

cpu
torch.Size([7, 2, 10])
torch.Size([7, 2, 5])
torch.Size([2, 7, 10])
torch.Size([2, 7, 5])

--------------------------------------------------------
device = torch.device("mps")
print(device)

batch_size = 2
seq_len = 7
input_size = 10

# default setup with batch_first=False
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=False).to(device)
x = torch.randn(seq_len, batch_size, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

# batch_first=True
model = nn.LSTM(input_size=input_size, hidden_size=5, num_layers=1, batch_first=True).to(device)
x = torch.randn(batch_size, seq_len, input_size).to(device)
print(x.shape)
out, (h, c) = model(x)
print(out.shape)

mps
torch.Size([7, 2, 10])
torch.Size([7, 2, 5])
torch.Size([2, 7, 10])
torch.Size([7, 2, 5])

Thanks for the update! As it looks indeed wrong for this platform, could you create a GitHub issue and reference this thread there, please?

Done:

1 Like