The inputs to nn.conv2d
are input channels, output channels, kernel size. Your call should look like that example.
Then it complains about this:
RuntimeError: Given groups=1, weight[6, 3, 5, 5], so expected input[10, 1, 560, 656] to have 3 channels, but got 1 channels instead
or if i try it with just one input channel because of the fact the image just has one:
RuntimeError: invalid argument 2: size '[-1 x 400]' is invalid for input with 3529120 elements at ..\src\TH\THStorage.c:37
For my First error:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3,6,5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
And my second error if:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1,6,5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Can you update your code to show what you have now?
Hmm i still get the error:
x = x.view(-1, 16 * 5 * 5)
RuntimeError: invalid argument 2: size '[-1 x 400]' is invalid for input with 134560 elements at ..\src\TH\THStorage.c:37
Maybe something to do with the
x= x.view(-1,16*5*5)
?
Apparently something goes wrong in the step of flattening the data
Ok it works so far right now. But i get problems with my labels.
Apperently the labels have the wrong form?
The Error is :
torch.Size([10, 3])
Traceback (most recent call last):
File "<ipython-input-76-4f1765267e06>", line 1, in <module>
runfile('D:/Nextcloud/Python/Gamebot/model.py', wdir='D:/Nextcloud/Python/Gamebot')
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
execfile(filename, namespace)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/Nextcloud/Python/Gamebot/model.py", line 83, in <module>
loss = criterion(outputs,labels)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\loss.py", line 759, in forward
self.ignore_index, self.reduce)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 1442, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 1332, in nll_loss
return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
RuntimeError: multi-target not supported at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thnn\generic/ClassNLLCriterion.c:22
And the code
#import all torch libaries
import torch
import torchvision
import torchvision.transforms as transforms
from PIL import Image
from CustomDataset import CustomMouseDataset,Rescale
def load_data():
# transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
#
# #Load Recoreded Data
# with h5py.File('data/video_data_22_7_2018_17_46','r') as data:
# video = data['video'][()]
# mouse = data['mouse'][()]
# video = video[:50]
# mouse = mouse[:50]
transform = transforms.Compose([transforms.ToTensor()])
train_data = CustomMouseDataset('data/video_data_22_7_2018_17_46',transform)
train_loader = torch.utils.data.DataLoader(train_data,batch_size=10,shuffle=True)
return train_loader
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1,6,5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1,16*5*5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
train_data = load_data()
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.001, momentum = 0.9)
#Train the Network
for epoch in range(2):
running_loss = 0.0
for i,data in enumerate(train_data,0):
inputs = data['frame']
labels = data['mouse']
labels = labels.long()
print(type(labels))
print(labels)
print(labels.shape)
#Zero gradients Parameter
optimizer.zero_grad()
#forward + backward +optimize
outputs = net(inputs)
loss = criterion(outputs,labels)
loss.backward()
optimizer.step()
if i % 4 == 1: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 5))
running_loss = 0.0
print('Finished Training')
Your labels should contain the class indices, not the one-hot encoded version.
From the shape information you’ve printed it looks like you are encoding the class as a tensor of shape [10, 3]
. Instead your target should be a torch.LongTensor
of shape [10]
containing values between 0
and 2
.
The problem is my labels arent classes. They are int values of coridantes. a single label has z.b [x cords of my mouse,y cords of my mouse, click or not click ]. Bsp: [1234,254,1]. And this x times for the number of batches
How do i convert my vector properly to this?
OK, I see. Have you thought about using a regression with nn.MSELoss
instead?
As your coordinates are continuous, a classification setup might be pretty hard to train.
I didnt try it yet. I am quite new to Pytorch and this field in general so i doesnt know all of the different Lossfunction / Optimizer and other. So do i have just change the the loss function or what else do i need to change?
//Edit ok it works with this Loss FUnction but the calculated Loss ist always 0.000 ?
//Edit 2: And if i change the Values of the input image from 32x32 to other values it complains about:
> x = x.view(-1,16*5*5)
>
> RuntimeError: invalid argument 2: size '[-1 x 1600]' is invalid for input with 27040 elements at ..\src\TH\THStorage.c:37
How can i get the proper value for the flatten?
You could split the outputs into the regression problem (mouse coordinates) and the classification problem (click/no-click).
Both outputs should be passed to the appropriate loss function.
Here is a very simple example code you could use as a starter:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 3, 1, 1)
self.pool1 = nn.MaxPool2d(2)
self.fc1 = nn.Linear(6*12*12, 20)
self.fc2a = nn.Linear(20, 2) # Regression
self.fc2b = nn.Linear(20, 2) # Classification
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool1(x)
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x1 = self.fc2a(x)
x2 = F.log_softmax(self.fc2b(x), dim=1)
return x1, x2
model = MyModel()
criterion1 = nn.MSELoss()
criterion2 = nn.NLLLoss()
x = torch.randn(1, 1, 24, 24)
target1 = torch.empty(1, 2).random_(2000)
target2 = torch.empty(1, dtype=torch.long).random_(2)
output1, output2 = model(x)
loss1 = criterion1(output1, target1) / 2000**2 # Scale loss
loss2 = criterion2(output2, target2)
loss = loss1 + loss2
loss.backward()
There are several ways to deal with your problem, and this is just one possible approach.
Let me know, if this works for you.
So far so good i just don’t understand why the labels are random values?
Doesnt i have to use my image as an input?
You should of course use your data and targets. I just created an executable code snippet for you to check for the shapes etc.
x = F.relu(self.fc1(x))
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 992, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [10 x 1536], m2: [864 x 20] at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\th\generic/THTensorMath.c:2033
Do i have the wrong input size? Or do i have to change the input channels of the Linear layer?
Yes, you have to change the in_features
of the linear layer to fit your data shape. I used 24x24
images. What is your image size?
My is 64x64 what values are the input size i have tpo change?
If you want to use my simple model, you would have to change self.fc1
to:
self.fc1 = nn.Linear(6*32*32, 20)
It still complains about size:
return F.linear(input, self.weight, self.bias)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 992, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [10 x 1536], m2: [6144 x 20] at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\th\generic/THTensorMath.c:2033
``'
Event after i changed it to 6*32*32
Are you using your code? Then you are pooling twice and should use 6*16*16
instead.
No, I used yours. But if i try 16 * 16 * 6 i get the error:
input, target, size_average, reduce)
File "D:\Programme\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\functional.py", line 1537, in _pointwise_loss
return lambd_optimized(input, target, size_average, reduce)
RuntimeError: input and target shapes do not match: input [10 x 2], target [1 x 3] at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thnn\generic/MSECriterion.c:13
Have a look at the different targets in my example.
While target1
keeps the coordinates and is used in nn.MSELoss
. target2
is used for the classification of the mouse click.
You would have to split your target in these two parts.