Your linear layer is not doing the same as a TimeDistributedDense in Keras. You are only using the last time step, and ditching everything else.
Have a look at my TimeDistributed wrapper here:
Your linear layer is not doing the same as a TimeDistributedDense in Keras. You are only using the last time step, and ditching everything else.
Have a look at my TimeDistributed wrapper here: