Sorry for not a proper reply, but to add a comment: You might want to describe what your tasks is, how your input and output looks like, and maybe have an image of the model architecture.
At least for me, just looking at the Keras code doesn’t help at all, although I played a bit with it – however, I never fully grasped the TimeDistributed
layer, this includes that I wonder if there’s really a need to re-implement that layer in Python. For example, a ReLU
and MaxPool2d
layer after Conv2d
layer is pretty standard for CNNs, and Pytorch CNN I’ve ever seen requires this TimeDistributed
work-around.