Help simple Keras to Pytorch model implementation

Hi all,

I am trying to reimplement a Keras model in Pytorch. I have done my version but I would like to compare it with someone else version more expert in Keras than me to see if it may has less params etc

N_CONTEXT=61
N_INPUT=26
N_HIDDEN=256
N_OUTPUT=414
model = Sequential()
model.add(TimeDistributed(Dense(N_HIDDEN), input_shape=(N_CONTEXT, N_INPUT)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.1))

model.add(GRU(N_HIDDEN, return_sequences=False))  
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.1)) 

model.add(Dense(N_OUTPUT)) 
model.add(Activation('linear'))

Thank you

Do you see any discrepancy between the number of parameters of your Keras model and the PyTorch model you have written?
If so, which layer is causing the difference?

Yes I do.
I have two implementations for the TimeDistributed layer.
The keras models has about 650K params, my first version has 8M params and the second 385K params.

Could you post the PyTorch implementation here, so that we could compare them and see what’s causing this issue?