Can anyone check my mistake and explain me how to correct the error?

Soumyajit_Das · June 17, 2019, 9:34am

my code:

import torch
import torchvision
import torchvision.transforms as transforms

transform=transforms.Compose([transforms.Grayscale(3),transforms.Resize(256),transforms.ToTensor(),
                            transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

trainset=torchvision.datasets.MNIST(root='./data',train=True,download=True,transform=transform)
trainloader=torch.utils.data.DataLoader(trainset,batch_size=32,shuffle=True,num_workers=2)

testset=torchvision.datasets.MNIST(root='./data',train=False,download=True,transform=transform)
testloader=torch.utils.data.DataLoader(testset,batch_size=32,shuffle=False,num_workers=2)

import torch.nn as nn
import torch.nn.functional as F

device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

import torchvision.models as models
alexnet=models.alexnet(pretrained=True)
alexnet.classifier[6]=nn.Linear(4096,10)
print(alexnet)
print('Model Downloaded')

class MnistAlexnet(nn.Module):

        def __init__(self,alexnet):
            super(MnistAlexnet,self).__init__()
            self.alexnet=alexnet

        def forward(self,x):
            return nn.Softmax(super(MnistAlexnet,self).forward(x),dim=-1)

model=MnistAlexnet(alexnet)  
print(model)


import time
import torch.optim as optim

batch_size=128
epochs=5
model.to(device)
criterion=nn.CrossEntropyLoss()
optimizer=optim.Adam(alexnet.parameters(),lr=0.001)


for epoch in range(epochs):
        running_loss=0.0
        total=0
        correct_classified=0
        start_time=time.time()
        model.train()
        for i,data in enumerate(trainloader):
            inputs,labels=data
            inputs,labels=inputs.to(device),labels.to(device)
            optimizer.zero_grad()
            outputs=model(inputs)
            loss=criterion(outputs,labels)
            loss.backward()
            optimizer.step()
            _,predicted=torch.max(outputs.data,1)
            total+=labels.size(0)
            correct_classified+=(predicted==labels).sum().item()
            running_loss+=loss.item()
            if i % 200 == 199:
               avg_loss=running_loss/200
               print('Epoch:[%d, %5d] Batch: %5d loss: %.3f' % (epoch+1,i+1,i+1,avg_loss))
            running_loss=0.0
            train_acc=(100*correct_classified/total)
       # print("Time/epoch: {} sec".format(time.time()-start_time))
       # train_acc=(correct_classified/total)
       # print('Train Accuracy :%d'%(train_acc))
        c=0
        total=0
        model.eval()
        with torch.no_grad():
            for data in testloader:
               images,labels=data
               inputs,labels=images.to(device),labels.to(device)
               outputs=model(inputs)
               _,predicted=torch.max(outputs.data,1)
               total+=labels.size(0)
               c+=(predicted==labels).sum().item()
               test_acc=(100*c/total)

        print("c=",c," total=",total)
        print('Accuracy of the network on test images:%d %%' % test_acc)



print('trained')
                     


Error message:
Traceback (most recent call last):
  File "editAlexnet_MNIST2.py", line 58, in <module>
    outputs=model(inputs)
  File "/home/anaconda2/envs/pytorch_v36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "editAlexnet_MNIST2.py", line 32, in forward
    return nn.Softmax(super(MnistAlexnet,self).forward(x),dim=-1)
  File "/home/anaconda2/envs/pytorch_v36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 88, in forward
    raise NotImplementedError
NotImplementedError

Please Note: I am new to this. I was trying to apply Softmax Function in the last return of forward function. Please help.

ptrblck · June 17, 2019, 10:08am

The forward method of MnistAlexnet contains an error.
Currently you are trying to call super(MnistAlexnet,self).forward(x), i.e. the forward method of nn.Module, which is not implemented.
Instead just use:

class MnistAlexnet(nn.Module):

        def __init__(self,alexnet):
            super(MnistAlexnet,self).__init__()
            self.alexnet=alexnet

        def forward(self,x):
            return self.alexnet(x)

You don’t need to apply the softmax on your outputs as nn.CrossEntropyLoss expects logits and applied F.log_softmax and nn.NLLLoss internally.
Also, you shouldn’t call forward directly, as this might break hooks, so just call the module directly.

PS: I’ve formatted your code to be able to copy it directly for debugging.
You can add code by wrapping it in three backticks ```

Soumyajit_Das · June 17, 2019, 10:35am

@ptrblck thank you for your reply.
Here, I am trying to train the MNIST dataset using pretrained alexnet. Now I want to apply the softmax function, to the output of each image to get the idea that the image lies to which of the digit 0-9. Please help me by telling where should I apply the softmax function ?
Please Note: The training accuracy is very low to 11%. And after each epoch I printed the predicted correct_classified variable and total variable, and came to know that correct_classified is printing nearabout 1135 whereas total is 10000. Thus accuracy is nearabout 11%.

I am new to this please help me with it…thank you

ptrblck · June 17, 2019, 10:41am

To get the prediction, you can use torch.argmax(output, 1).
The logits will give you the same prediction as the softmax output.

If you would like to see the “probabilities” of your output, you could also use

probs = F.softmax(output, 1)

but don’t pass probs to your criterion, as this might result in bad training.

Soumyajit_Das · June 17, 2019, 11:57am

@ptrblck
Thank you for your reply now I can see the probabilities.
How do I increase the accuracy…am I doing something very wrong as I am getting only 11%…??

ptrblck · June 17, 2019, 12:02pm

Could you post the current model definition and training code so that we can have a look?

Soumyajit_Das · June 17, 2019, 12:08pm

@ptrblck here is my full code:

import torch
import torchvision
import torchvision.transforms as transforms

transform=transforms.Compose([transforms.Grayscale(3),transforms.Resize(256),transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

trainset=torchvision.datasets.MNIST(root='./data',train=True,download=True,transform=transform)
trainloader=torch.utils.data.DataLoader(trainset,batch_size=32,shuffle=True,num_workers=2)

testset=torchvision.datasets.MNIST(root='./data',train=False,download=True,transform=transform)
testloader=torch.utils.data.DataLoader(testset,batch_size=32,shuffle=False,num_workers=2)

import torch.nn as nn
import torch.nn.functional as F

device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

import torchvision.models as models
alexnet=models.alexnet(pretrained=True)
alexnet.classifier[6]=nn.Linear(4096,10)
print(alexnet)
print('Model Downloaded')

class MnistAlexnet(nn.Module):

        def __init__(self,alexnet):
            super(MnistAlexnet,self).__init__()
            self.alexnet=alexnet

        def forward(self,x):
            return self.alexnet(x)

model=MnistAlexnet(alexnet)  
print(model)


import time
import torch.optim as optim

batch_size=128
epochs=5
model.to(device)
criterion=nn.CrossEntropyLoss()
optimizer=optim.Adam(alexnet.parameters(),lr=0.001)


for epoch in range(epochs):
        running_loss=0.0
        total=0
        probs=0
        correct_classified=0
        start_time=time.time()
        model.train()
        for i,data in enumerate(trainloader):
            inputs,labels=data
            inputs,labels=inputs.to(device),labels.to(device)
            optimizer.zero_grad()
            outputs=model(inputs)
            loss=criterion(outputs,labels)
            loss.backward()
            optimizer.step()
           # _,predicted=torch.max(outputs.data,1)
            predicted=torch.argmax(outputs,1)
            total+=labels.size(0)
            probs=F.softmax(outputs,1)[1]
            #print("probs=",probs)
            correct_classified+=(predicted==labels).sum().item()
            running_loss+=loss.item()
            if i % 200 == 199:
               avg_loss=running_loss/200
               print('Epoch:[%d, %5d] Batch: %5d loss: %.3f' % (epoch+1,i+1,i+1,avg_loss))
            running_loss=0.0
            train_acc=(100*correct_classified/total)
       # print("Time/epoch: {} sec".format(time.time()-start_time))
       # train_acc=(correct_classified/total)
       # print('Train Accuracy :%d'%(train_acc))
        c=0
        total=0
        model.eval()
        with torch.no_grad():
            for data in testloader:
               images,labels=data
               inputs,labels=images.to(device),labels.to(device)
               outputs=model(inputs)
              # _,predicted=torch.max(outputs.data,1)
               predicted=torch.argmax(outputs,1)
               total+=labels.size(0)
               c+=(predicted==labels).sum().item()
               test_acc=(100*c/total)

        print("c=",c," total=",total)
        print('Accuracy of the network on test images:%d %%' % test_acc)



print('trained')

Soumyajit_Das · June 17, 2019, 12:19pm

@ptrblck
The accuracy improved as I changed predicted=torch.argmax(outputs,1) in the test part of the code…right now it is showing about 97% test accuracy. Thank you very much sir…Although why torch.max(outputs,1) was causing trouble is a mystery…I would like to know about it

ptrblck · June 17, 2019, 12:21pm

torch.max returns the max values and their corresponding indices.
I’m not sure if that was really the issue or if your current setup is just instable, i.e. if I set the learning rate to 1e-4 the model converges better.

Soumyajit_Das · June 17, 2019, 12:27pm

@ptrblck
I do not know about that sir may be it was my fault as I’m using the server of IIT Kharagpur, and lots of people are working here. Thank you very much sir for helping me out…