I have a cnn as below for cifar10:

```
self.layer1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size= 3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2))
self.layer2 = nn.Sequential(
nn.Conv2d(64, 128, kernel_size= 3),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2))
self.layer3 = nn.Sequential(
nn.Conv2d(128, 256, kernel_size= 3),
nn.BatchNorm2d(256),
nn.ReLU())
self.layer4 =nn.AvgPool2d(8)
self.layer5 =nn.Linear(256, num_classes)
self.layer6 =nn.Softmax(dim=1)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = out.reshape(out.size(0), -1)
out = self.layer5(out)
out = self.layer6(out)
return out
```

```
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay = 0.0005)
```

I am training this network for 20 epochs, and I use the below data augmentation methods.

1- random crop(32, padding=4)

2- random horizontal flip

3- normalization

4- random affine for horizontal and vertical translation

5- mixup(alpha=1.0)

6- cutout(num_holes=1, size=16)

Each time I add a new data augmentation after normalization(4,5,6), my validation accuracy decreases from 60% to 50%. I know if the model’s capacity is low it is possible.

However, when I train this network on keras for 20 epochs, using the same data augmentation methods, I can reach over 70% validation accuracy.

What am I missing?

Note: in keras implementation convolution and dense layers have L2 kernel regularization, in pytorch implementation only the optimizer has L2. Could that be the reason?