Converting a Keras LSTM model to Pytorch

I am trying to convert an LSTM & Embedding model from Keras to Pytorch. The Keras model summary looks like this

.

The converted Pytorch model looks like this

class test_model(nn.Module):
def init(self):
super(test_model,self).init()
self.embed0 = nn.Embedding(12,emb_size(12))
self.embed1 = nn.Embedding(53,emb_size(53))
self.embed2 = nn.Embedding(12,emb_size(12))
self.LSTM = nn.LSTM(16, 64,batch_first = True)
self.linear = nn.Linear(64, 186)

def forward(self,X):
    embed_0 = self.embed0(X[:,:,0])
    embed_1 = self.embed1(X[:,:,1])
    embed_2 = self.embed2(X[:,:,2])
    X = torch.cat((embed_0,embed_1,embed_2),dim=-1)
    output,hidden = LSTM(X) 
    return self.linear(output)

However, I am getting a different loss. The dataset/optimizer and loss functions are same across Keras and Pytorch. Any idea ?

I’m not deeply familiar with reading the Keras output but are you using two Dense layers at the end?
The softmax (Dense) output shape is given as (None, 186, 12) which doesn’t fit your nn.Linear output. If that’s the case, I’m unsure why a shape mismatch error isn’t raised when e.g. calculating the loss.

1 Like

Do you need nn.Softmax at the end of your torch model?

        nn.Linear(..., num_classes),
        nn.Softmax(dim=1)

Torch and tf handle softmax differently. Take a look at these examples:

Thanks for the reply. I changed my module to look like this. The reshape at the end is to make sure it predicts 12 classes. I think the issue might be in training. Any idea if I am using the cat layer correctly and the reshape for crossentropy?

class test_model(nn.Module):
    def __init__(self):
        super(test_model,self).__init__()
        self.embed0 = nn.Embedding(12,emb_size(12))
        self.embed1 = nn.Embedding(53,emb_size(53))
        self.embed2 = nn.Embedding(12,emb_size(12))
        self.LSTM = nn.LSTM(16, 64,batch_first = True)
        self.linear1 = nn.Linear(64, 64)
        self.linear2 = nn.Linear(64, 12)

        
    def forward(self,X):
        
        embed_0 = self.embed0(X[:,:,0])
        embed_1 = self.embed1(X[:,:,1])
        embed_2 = self.embed2(X[:,:,2])
        X = torch.cat((embed_0,embed_1,embed_2),dim=-1)
        output,hidden = LSTM(X) 
        X = self.linear1(output)
        X = self.linear2(X)
        return X.reshape(-1,12,186)

also do you happen to know if keras equivalent for
LSTM(memory_units, return_sequences=True, stateful=False, name='lstm')
would be
nn.LSTM(16, 64,batch_first = True)
and the use the output

You dont need to use softmax if we are using CrossEntropy loss. CrossEntropyLoss — PyTorch 1.10 documentation

I would recommend to check the shape of X before the reshape operation, since flattening the data into the batch dimension is usually wrong as done via X.reshape(-1,12,186).
If you are not careful, you could change the batch size of X which would then create shape mismatches e.g when trying to calculate the loss.

Thanks without the reshape my size is torch.Size([32, 186, 12]) if I pass this to loss (Crossentropy) it gives me an error. Basically, the # of classes is 12 ? if I follow the Crossentropy doc I need to change it (N,C,d1)

Yes, your interpretation of the expected input shape is correct.
However, the reshape operation is wrong as it would interleave the values.
Since you want to swap some dimensions, use X = X.permute(0, 2, 1).contiguous() instead.