Hi @ptrblck,
I included training= True argument for batch_norm and dropout2d for 2nd architecture as follows:
out = torch.nn.functional.batch_norm(out, running_mean= torch.zeros(out_channels), running_var= torch.ones(out_channels),training=True)
out = torch.nn.functional.dropout2d(out, p=drop_out,training=True)
and like wise foe every layer.
I’m getting the following results after 10 epochs:
Batch Time TestLoss TestAcc TrainLoss TrainAcc
0 0.192 50.07 9.69 51.799 9.0
200 1.313 50.07 9.69 27.512 18.0
400 2.617 25.648 21.43 20.614 20.0
.
.
.
1400 8.938 10.299 25.65 12.213 17.0
1600 10.241 8.948 25.78 7.674 30.0
1800 11.537 7.973 26.16 7.099 27.0
2000 12.835 7.16 26.24 6.351 22.0
.
.
.
4800 30.327 3.46 27.8 2.886 34.0
4990 31.37 3.46 27.8 3.908 23.0
Now, if I use nn.Sequential and torch.nn module (1st architecture) keeping other things the same, the results are as follows after 10 epochs:
Batch Time TestLoss TestAcc TrainLoss TrainAcc
0 0.203 2.303 6.99 2.434 9.0
200 1.338 2.303 6.99 1.586 41.0
400 2.684 1.392 49.09 1.444 49.0
.
.
.
1600 10.542 0.876 69.24 0.996 64.0
1800 11.857 0.833 70.8 1.116 55.0
2000 13.177 0.791 71.96 1.249 60.0
.
.
.
4600 29.704 0.634 78.15 0.641 77.0
4800 31.024 0.625 78.25 0.702 79.0
4990 32.095 0.625 78.25 0.906 63.0
Upon comparison of both results there are two things which are contrary:
- Final test accuracy: 27.8% vs. 78.25%
- Initial test loss: 50.07 vs. 2.303
Here are more details if you could look at:
#variables
in_channels=3
out_channels= 32
kernel_size = 5
stride = 1
kernel_size_p = 2
stride_p = 2
optimizer = torch.optim.Adam(Classifier.parameters(), lr)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma= 0.9)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range(num_epochs):
for _, (images, labels) in enumerate(train_loader):
if Batches % 250== 0:
Classifier.eval()
with torch.no_grad():
correct=0
total=0
LastTestLoss = 0
LastTestAcc = 0
for _, (testimages, testlabels) in enumerate(test_loader):
output = Classifier(testimages)
LastTestLoss += criterion(output, testlabels).item()
predictions = torch.argmax(output,1)
total += testlabels.shape[0]
correct += (predictions == testlabels).sum().float().item()
LastTestAcc = correct*100.0/total
LastTestLoss /= len(test_loader)
Classifier.train()
outputs = Classifier(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Tracking Accuracy
prediction = torch.argmax(outputs,1)
correct = (prediction == labels).sum().float().item()
acc = 100.0*(correct/labels.shape[0])
...
Batches+=1
scheduler.step()