CNN with torch.nn.functional vs. with torch.nn

Hi @ptrblck,
I included training= True argument for batch_norm and dropout2d for 2nd architecture as follows:

out = torch.nn.functional.batch_norm(out, running_mean= torch.zeros(out_channels), running_var= torch.ones(out_channels),training=True)
out = torch.nn.functional.dropout2d(out, p=drop_out,training=True)

and like wise foe every layer.
I’m getting the following results after 10 epochs:

Batch     Time    TestLoss     TestAcc    TrainLoss TrainAcc          
0         0.192    50.07        9.69        51.799   9.0              
200       1.313    50.07        9.69        27.512   18.0             
400       2.617    25.648       21.43       20.614   20.0             
.
.
.
1400      8.938    10.299       25.65       12.213   17.0            
1600      10.241   8.948        25.78       7.674    30.0             
1800      11.537   7.973        26.16       7.099    27.0             
2000      12.835   7.16         26.24       6.351    22.0             

.
.
.   
4800      30.327   3.46         27.8        2.886    34.0            
4990      31.37    3.46         27.8        3.908    23.0    

Now, if I use nn.Sequential and torch.nn module (1st architecture) keeping other things the same, the results are as follows after 10 epochs:


Batch     Time    TestLoss     TestAcc    TrainLoss TrainAcc          
0         0.203    2.303        6.99        2.434    9.0          
200       1.338    2.303        6.99        1.586    41.0            
400       2.684    1.392        49.09       1.444    49.0            
.
.
.          
1600      10.542   0.876        69.24       0.996    64.0           
1800      11.857   0.833        70.8        1.116    55.0            
2000      13.177   0.791        71.96       1.249    60.0            
.
.
.    
4600      29.704   0.634        78.15       0.641    77.0             
4800      31.024   0.625        78.25       0.702    79.0           
4990      32.095   0.625        78.25       0.906    63.0   

Upon comparison of both results there are two things which are contrary:

  1. Final test accuracy: 27.8% vs. 78.25%
  2. Initial test loss: 50.07 vs. 2.303

Here are more details if you could look at:

#variables

in_channels=3
out_channels= 32       
kernel_size = 5
stride = 1
kernel_size_p = 2
stride_p = 2
optimizer = torch.optim.Adam(Classifier.parameters(), lr)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma= 0.9)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range(num_epochs):
    
    for _, (images, labels) in enumerate(train_loader):
        if Batches % 250== 0:
            Classifier.eval()
            with torch.no_grad():
                correct=0
                total=0
                LastTestLoss = 0
                LastTestAcc = 0
                for _, (testimages, testlabels) in enumerate(test_loader):
                    output = Classifier(testimages)
                    LastTestLoss += criterion(output, testlabels).item()
                    predictions = torch.argmax(output,1)
                    total += testlabels.shape[0]
                    correct += (predictions == testlabels).sum().float().item()
                LastTestAcc = correct*100.0/total
                LastTestLoss /= len(test_loader)
            Classifier.train()                
        outputs = Classifier(images)               
        loss = criterion(outputs, labels) 
              
        loss.backward()
        optimizer.step()                                
        optimizer.zero_grad()                      
        
        # Tracking Accuracy
        prediction = torch.argmax(outputs,1)
        correct = (prediction == labels).sum().float().item()
        acc = 100.0*(correct/labels.shape[0])
    
       ...
        Batches+=1
    scheduler.step()