Import video in form of an numpy array in pytorch

The inputs to nn.conv2d are input channels, output channels, kernel size. Your call should look like that example.

Then it complains about this:


RuntimeError: Given groups=1, weight[6, 3, 5, 5], so expected input[10, 1, 560, 656] to have 3 channels, but got 1 channels instead

or if i try it with just one input channel because of the fact the image just has one:

RuntimeError: invalid argument 2: size '[-1 x 400]' is invalid for input with 3529120 elements at ..\src\TH\THStorage.c:37

For my First error:

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3,6,5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

And my second error if:

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1,6,5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Can you update your code to show what you have now?

Hmm i still get the error:

    x = x.view(-1, 16 * 5 * 5)

RuntimeError: invalid argument 2: size '[-1 x 400]' is invalid for input with 134560 elements at ..\src\TH\THStorage.c:37

Maybe something to do with the
x= x.view(-1,16*5*5) ?

Apparently something goes wrong in the step of flattening the data

Ok it works so far right now. But i get problems with my labels.

Apperently the labels have the wrong form?

The Error is :

  torch.Size([10, 3])
Traceback (most recent call last):

  File "<ipython-input-76-4f1765267e06>", line 1, in <module>
    runfile('D:/Nextcloud/Python/Gamebot/model.py', wdir='D:/Nextcloud/Python/Gamebot')

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
    execfile(filename, namespace)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/Nextcloud/Python/Gamebot/model.py", line 83, in <module>
    loss = criterion(outputs,labels)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\loss.py", line 759, in forward
    self.ignore_index, self.reduce)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 1442, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 1332, in nll_loss
    return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)

RuntimeError: multi-target not supported at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thnn\generic/ClassNLLCriterion.c:22

And the code


#import all torch libaries
import torch
import torchvision
import torchvision.transforms as transforms


from PIL import Image
from CustomDataset import CustomMouseDataset,Rescale


def load_data():
#    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
#    
#    #Load Recoreded Data
#    with h5py.File('data/video_data_22_7_2018_17_46','r') as data:
#        video = data['video'][()]
#        mouse = data['mouse'][()]
#    video = video[:50]
#    mouse = mouse[:50]    
    transform = transforms.Compose([transforms.ToTensor()])
    train_data = CustomMouseDataset('data/video_data_22_7_2018_17_46',transform)
    
    
    train_loader = torch.utils.data.DataLoader(train_data,batch_size=10,shuffle=True)
    
    return train_loader

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1,6,5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1,16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
net = Net()
train_data = load_data()

import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.001, momentum = 0.9)

#Train the Network
for epoch in range(2):
    running_loss = 0.0
    for i,data in enumerate(train_data,0):
        inputs = data['frame']
      
        labels = data['mouse']
        labels = labels.long()
   
        print(type(labels))
        print(labels)
        print(labels.shape)

        #Zero gradients Parameter
        optimizer.zero_grad()
        
        #forward + backward +optimize
        outputs = net(inputs)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        if i % 4 == 1:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 5))
            running_loss = 0.0

print('Finished Training')

Your labels should contain the class indices, not the one-hot encoded version.
From the shape information you’ve printed it looks like you are encoding the class as a tensor of shape [10, 3]. Instead your target should be a torch.LongTensor of shape [10] containing values between 0 and 2.

The problem is my labels arent classes. They are int values of coridantes. a single label has z.b [x cords of my mouse,y cords of my mouse, click or not click ]. Bsp: [1234,254,1]. And this x times for the number of batches
How do i convert my vector properly to this?

OK, I see. Have you thought about using a regression with nn.MSELoss instead?
As your coordinates are continuous, a classification setup might be pretty hard to train.

I didnt try it yet. I am quite new to Pytorch and this field in general so i doesnt know all of the different Lossfunction / Optimizer and other. So do i have just change the the loss function or what else do i need to change?

//Edit ok it works with this Loss FUnction but the calculated Loss ist always 0.000 ?

//Edit 2: And if i change the Values of the input image from 32x32 to other values it complains about:

>     x = x.view(-1,16*5*5)
> 
> RuntimeError: invalid argument 2: size '[-1 x 1600]' is invalid for input with 27040 elements at ..\src\TH\THStorage.c:37

How can i get the proper value for the flatten?

You could split the outputs into the regression problem (mouse coordinates) and the classification problem (click/no-click).

Both outputs should be passed to the appropriate loss function.
Here is a very simple example code you could use as a starter:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, 1, 1)
        self.pool1 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(6*12*12, 20)
        
        self.fc2a = nn.Linear(20, 2) # Regression
        self.fc2b = nn.Linear(20, 2) # Classification
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        
        x1 = self.fc2a(x)
        x2 = F.log_softmax(self.fc2b(x), dim=1)
        return x1, x2


model = MyModel()
criterion1 = nn.MSELoss()
criterion2 = nn.NLLLoss()

x = torch.randn(1, 1, 24, 24)
target1 = torch.empty(1, 2).random_(2000)
target2 = torch.empty(1, dtype=torch.long).random_(2)

output1, output2 = model(x)
loss1 = criterion1(output1, target1) / 2000**2 # Scale loss
loss2 = criterion2(output2, target2)
loss = loss1 + loss2
loss.backward()

There are several ways to deal with your problem, and this is just one possible approach.
Let me know, if this works for you.

So far so good i just don’t understand why the labels are random values?
Doesnt i have to use my image as an input?

You should of course use your data and targets. I just created an executable code snippet for you to check for the shapes etc.

    x = F.relu(self.fc1(x))

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())

RuntimeError: size mismatch, m1: [10 x 1536], m2: [864 x 20] at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\th\generic/THTensorMath.c:2033

Do i have the wrong input size? Or do i have to change the input channels of the Linear layer?

Yes, you have to change the in_features of the linear layer to fit your data shape. I used 24x24 images. What is your image size?

My is 64x64 what values are the input size i have tpo change?

If you want to use my simple model, you would have to change self.fc1 to:

self.fc1 = nn.Linear(6*32*32, 20)

It still complains about size:

    return F.linear(input, self.weight, self.bias)

  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())

RuntimeError: size mismatch, m1: [10 x 1536], m2: [6144 x 20] at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\th\generic/THTensorMath.c:2033
``'
Event after i changed it to 6*32*32

Are you using your code? Then you are pooling twice and should use 6*16*16 instead.

No, I used yours. But if i try 16 * 16 * 6 i get the error:

input, target, size_average, reduce)


  File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 1537, in _pointwise_loss
    return lambd_optimized(input, target, size_average, reduce)

RuntimeError: input and target shapes do not match: input [10 x 2], target [1 x 3] at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thnn\generic/MSECriterion.c:13

Have a look at the different targets in my example.
While target1 keeps the coordinates and is used in nn.MSELoss. target2 is used for the classification of the mouse click.
You would have to split your target in these two parts.