Always Output of 0?

Losspost · July 26, 2018, 8:44pm

Hi.
I have a Network which should do the following task.
It gets an Images as Input. Now it should give me 2 Outputs.The first output is the cords where the mouse should move, the second output ist if it should click or not. First i thougth it could be overfitting. But through the fact that there is far more “click” (1) as “no click” (0) values I am wondering why it still always puts out 0.

My code:


import torch
import torchvision
import torchvision.transforms as transforms

import os
from PIL import Image
from CustomDataset import CustomMouseDataset,Rescale


def load_data():
#    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
#    
#    #Load Recoreded Data
#    with h5py.File('data/video_data_22_7_2018_17_46','r') as data:
#        video = data['video'][()]
#        mouse = data['mouse'][()]
#    video = video[:50]
#    mouse = mouse[:50]    
    transform = transforms.Compose([transforms.ToTensor()])
    train_data = CustomMouseDataset('data/video_data_22_7_2018_17_46',transform)
    
    
    train_loader = torch.utils.data.DataLoader(train_data,batch_size=10,shuffle=True)
    
    return train_loader

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, 1, 1)
        self.pool1 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(6*16*16, 20)
        
        self.fc2a = nn.Linear(20, 2) # Regression
        self.fc2b = nn.Linear(20, 1) # Classification
        
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        
        x1 = self.fc2a(x)
        x2 = F.log_softmax(self.fc2b(x), dim=1)
        return x1, x2
def main():
    pass

if __name__ == "__main__":    
    net = Net()
    train_data = load_data()
    
    import torch.optim as optim
    criterion = nn.MSELoss()
    criterion2 = nn.NLLLoss()
    optimizer = optim.SGD(net.parameters(), lr = 0.001, momentum = 0.9)
    
    #Train the Network
    for epoch in range(10):
        running_loss = 0.0
        for i,data in enumerate(train_data,0):
            inputs = data['frame']
          
            labels = data['mouse']
            
            target_1 = labels[:,:2]
            target_2 = labels[:,2].unsqueeze(1)
    #        print(type(target_2))
    #        print(target_2.shape)
        
    
            #Zero gradients Parameter
            optimizer.zero_grad()
            
            #forward + backward +optimize
            output1,output2 = net(inputs)
           
            loss1 = criterion(output1,target_1)/2000
            loss2 = criterion(output2,target_2)
            loss = loss1 + loss2
            loss.backward()
            optimizer.step()
            running_loss = loss
            if i % 300 == 0:    # print every 2000 mini-batches
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 5))
                print(output1,output2)
                running_loss = 0.0
                
    try:            
        torch.save(net.state_dict(),'Model/model_save')
    except:
        os.mkdir("Model")
        torch.save(net.state_dict(),'Model/model_save')
    print('Finished Training')

Example Output:

tensor([[ 500.9624,  829.2684],
        [ 442.0272,  770.7896],
        [ 534.6215,  858.7812],
        [ 465.2378,  793.0269],
        [ 516.5006,  844.0679],
        [ 512.5015,  837.9688],
        [ 469.4029,  797.4636],
        [ 462.6033,  787.3254],
        [ 453.6012,  784.0760],
        [ 503.6086,  833.6633]]) tensor([[ 0.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 0.]])

ptrblck · July 26, 2018, 8:51pm

Your classification path has just one output, i.e. one class.
If you use F.log_softmax(x, dim=1) on it, it will “normalize” your single prediction to be always 0.

Change it to self.fc2d = nn.Linear(20, 2) for a two-class classification as suggested here.

Alternatively, you could use nn.BCELoss with a single output and F.sigmoid.

Losspost · July 26, 2018, 9:16pm

Is there a page or way how to learn what kind of Loss Function/optimizer or Activation Function i have to use?
I used different tutorials but they didnt explained this point

ptrblck · July 26, 2018, 9:45pm

You can find the different loss functions in the docs.

I assume you’ve already found the PyTorch tutorials and would like to get started creating your own models.
There are a lot of good resources to learn more, e.g. Stanford’s CS231n for Visual Recognition (with free Lecture videos), fast.ai’s course (they use a high-level wrapper built on top of PyTorch) or Andrew NG’s coursera course.

For the beginning you could stick to the following (this is my biased opinion and the recommendations might not be the best for your use case!):

For regression, try nn.MSELoss() and no non-linearity for your model output. Also normalizing the target to [0, 1] or [-1, 1] might help.
For classification use F.log_softmax + nn.NLLLoss or no non-linearity + nn.CrossEntropyLoss.
Try optim.Adam as the default optimizer.
Try nn.ReLU as your default non-linearity between layers.

Once your models converge you can tweak your code in a fancy way and e.g. use skip connections, cyclic learning rates etc. The deeplearningbook is also a great resource.

Losspost · July 27, 2018, 5:23pm

One last question i have. If i now add New Layer to the Network. How can i calculated the input of the first Linear Layer?

ptrblck · July 27, 2018, 5:39pm

Have a look at the output formula for nn.Conv2d and nn.MaxPool2d.
E.g. for a kernel_size=3 you will “lose” 2 pixel in height and width if you don’t pad. Adding padding=1 will keep the same shape.
MaxPool2d with a kernel_size=2 and stride=2 will reduce the spatial dimensions by 2.

These are just common values for these layers and you can design your model as you wish.
For the last layer you would have to multiply the channels by the height and width.
In your example you have 6 channels and a spatial size of 16x16.

Losspost · July 27, 2018, 5:49pm

super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, 1, 1)
        self.conv2 = nn.Conv2d(6,12,3,1,1)
        self.pool1 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(12*32*32, 20)
        
        self.fc2a = nn.Linear(20, 2) # Regression
        self.fc2b = nn.Linear(20, 1) # Classification

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool1(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        
        x1 = self.fc2a(x)
        #x2 = F.log_softmax(self.fc2b(x), dim=1)
        x2 = F.sigmoid(self.fc2b(x))
        return x1, x2

With the Calculation i get always 32x32 so whats wrong in there?

ptrblck · July 27, 2018, 8:54pm

What is your input size? Note that you are pooling twice.

Losspost · July 27, 2018, 10:27pm

32x32 Is my input image with 1 channel (Gray)

ptrblck · July 27, 2018, 10:34pm

Since you are pooling twice, the spatial dimensions will be 32/2/2=8. Your linear layer should thus take 12*8*8 input features.

If you don’t want to calculate it, you could also print the shape of your tensor before the view operation and just use this sizes.