I am currently working on multi task learning problem (MTL). I have a Keras code and want to convert it to Pytorch code.
I noticed that the number of learnable parameters for an LSTM block in Pytorch is different from number of learnable parameters in Keras code. My input size is 29 while output size is 32.
Using the link below, the number of learnable parameters for Keras comes out as
params = 4 * ((size_of_input + 1) * size_of_output + size_of_output^2) = 7936
While using the Pytorch documentation, the learnable parameters come out to be 8064. I got it by adding all the learnable weights.
Am I mistaken or am I missing something?
Finally, is equalizing the number of learnable parameters for the two models a good metric to ensure that the graph is same for both?