Converting a Keras LSTM model to Pytorch

sushmit_roy · February 20, 2022, 9:54pm

I am trying to convert an LSTM & Embedding model from Keras to Pytorch. The Keras model summary looks like this

.

The converted Pytorch model looks like this

class test_model(nn.Module):
def init(self):
super(test_model,self).init()
self.embed0 = nn.Embedding(12,emb_size(12))
self.embed1 = nn.Embedding(53,emb_size(53))
self.embed2 = nn.Embedding(12,emb_size(12))
self.LSTM = nn.LSTM(16, 64,batch_first = True)
self.linear = nn.Linear(64, 186)
def forward(self,X):
    embed_0 = self.embed0(X[:,:,0])
    embed_1 = self.embed1(X[:,:,1])
    embed_2 = self.embed2(X[:,:,2])
    X = torch.cat((embed_0,embed_1,embed_2),dim=-1)
    output,hidden = LSTM(X) 
    return self.linear(output)

However, I am getting a different loss. The dataset/optimizer and loss functions are same across Keras and Pytorch. Any idea ?

ptrblck · February 21, 2022, 8:28pm

I’m not deeply familiar with reading the Keras output but are you using two Dense layers at the end?
The softmax (Dense) output shape is given as (None, 186, 12) which doesn’t fit your nn.Linear output. If that’s the case, I’m unsure why a shape mismatch error isn’t raised when e.g. calculating the loss.

HashRocketSyntax · February 22, 2022, 2:13pm

Do you need nn.Softmax at the end of your torch model?

        nn.Linear(..., num_classes),
        nn.Softmax(dim=1)

Torch and tf handle softmax differently. Take a look at these examples:

sushmit_roy · February 22, 2022, 3:35pm

Thanks for the reply. I changed my module to look like this. The reshape at the end is to make sure it predicts 12 classes. I think the issue might be in training. Any idea if I am using the cat layer correctly and the reshape for crossentropy?

class test_model(nn.Module):
    def __init__(self):
        super(test_model,self).__init__()
        self.embed0 = nn.Embedding(12,emb_size(12))
        self.embed1 = nn.Embedding(53,emb_size(53))
        self.embed2 = nn.Embedding(12,emb_size(12))
        self.LSTM = nn.LSTM(16, 64,batch_first = True)
        self.linear1 = nn.Linear(64, 64)
        self.linear2 = nn.Linear(64, 12)

        
    def forward(self,X):
        
        embed_0 = self.embed0(X[:,:,0])
        embed_1 = self.embed1(X[:,:,1])
        embed_2 = self.embed2(X[:,:,2])
        X = torch.cat((embed_0,embed_1,embed_2),dim=-1)
        output,hidden = LSTM(X) 
        X = self.linear1(output)
        X = self.linear2(X)
        return X.reshape(-1,12,186)

also do you happen to know if keras equivalent for
LSTM(memory_units, return_sequences=True, stateful=False, name='lstm')
would be
nn.LSTM(16, 64,batch_first = True)
and the use the output

sushmit_roy · February 22, 2022, 3:35pm

You dont need to use softmax if we are using CrossEntropy loss. CrossEntropyLoss — PyTorch 1.10 documentation

ptrblck · February 22, 2022, 5:58pm

I would recommend to check the shape of X before the reshape operation, since flattening the data into the batch dimension is usually wrong as done via X.reshape(-1,12,186).
If you are not careful, you could change the batch size of X which would then create shape mismatches e.g when trying to calculate the loss.

sushmit_roy · February 23, 2022, 6:03am

Thanks without the reshape my size is torch.Size([32, 186, 12]) if I pass this to loss (Crossentropy) it gives me an error. Basically, the # of classes is 12 ? if I follow the Crossentropy doc I need to change it (N,C,d1)

ptrblck · February 23, 2022, 6:18am

Yes, your interpretation of the expected input shape is correct.
However, the reshape operation is wrong as it would interleave the values.
Since you want to swap some dimensions, use X = X.permute(0, 2, 1).contiguous() instead.