Very different variational autoencoder results from keras to pytorch

I’m not sure if both loss functions are equal.
It seems the losses in Keras are averaged (I assumed divided by 2), while in PyTorch you are summing them. Both KL losses are multiplied by 0.5, but again in Keras you are using K.sum, while torch.mean in the PyTorch model.
I’m not that familiar with Keras and both codes might yield the same result, but just skimming through the code these lines looked a bit strange.

Also, I’m not sure how the padding in Keras works, but if you’ve already compared the activation shapes of both models, it should be fine.