Why does data augmentation decrease validation accuracy: pytorch/keras comparison

Your code looks generally good!
Could you try to apply the same weight initializations that are used in Keras to compare the models?
Here is a small example.
Also, could you post the Keras code, as there still might be some small differences?

Some minor issue:

  • Variables are deprecated and you can use tensors directly since PyTorch 0.4.0
  • It’s generally recommended to call the model directly instead of forward. You could change self.forward(x) to self(x).