Output different dimensions

Hello! Now im trying to understand how I can use pictures in pytorch. Im quite new to everything here. Now my problem is, i am expecting a output with a list with two numbers. But the output has the format [1, 18, 12, 2].

How do I get the format i desire?

There is a different thing i dont get, the output of the last convolution layer has 18 channels, but why does it output something with 12 layers?

Any advice is appreciated!

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from os import listdir
from torch.autograd import Variable
from torchvision import transforms
from PIL import Image

class Netz(nn.Module): # Das Netz
    
    def __init__(self):
        super(Netz,self).__init__()
        
        self.cn1 = nn.Conv2d(3,5, kernel_size=5) 
        self.cn2 = nn.Conv2d(5,7, kernel_size=5)
        self.cn3 = nn.Conv2d(7,12, kernel_size=5)
        self.cn4 = nn.Conv2d(12,18, kernel_size=5)
        self.ln1 = nn.Linear(12,200)
        self.ln2 = nn.Linear(200,20)
        self.ln3 = nn.Linear(20,10)
        self.ln4 = nn.Linear(10,2)

    def forward(self, x): 

        x = self.cn1(x)
        x = F.max_pool2d(x,2) 
        x = F.relu(x)
        x = self.cn2(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.cn3(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.cn4(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)

        x = F.relu(self.ln1(x)) 
        x = F.relu(self.ln2(x))
        x = F.relu(self.ln3(x))
        return F.relu(self.ln4(x)) 
    
model = Netz()


train_data_list = []
target_list = []
train_data = []
normalize = transforms.Normalize(
    mean = [0.485, 0.456, 0.406],
    std = [0.229, 0.224, 0.225]
    ) 
transform = transforms.Compose([transforms.Resize(256),
                                transforms.CenterCrop(256),
                                transforms.ToTensor(),
                                normalize]); #Liste von "Befehlen" die ausgeführt werden bei Aufruf
for f in listdir("E:/DataAI"):
    img = Image.open("E:/DataAI/" + f)
    img_tensor = transform(img)
    img_tensor.unsqueeze_(0)
    train_data_list.append(img_tensor)
    splitFN = f.split("-")
    x = splitFN[1].replace('x', '')
    y = splitFN[2].replace('y','')
    
    #print("x: " + x + " y: " + y)
    target = [int(x),int(y)]
    target_list.append(target)
    
    train_data.append((img_tensor, target))
    if len(train_data_list) >= 5:
        train_data.append((torch.stack(train_data_list), target_list))
        train_data_list = []
        target_list = []
    
    
optimizer = optim.Adam(model.parameters(), lr=0.01)
def train(gen):
    size = len(train_data)
    count = 0
    model.train()
    for data, target in train_data:
        data = Variable(data)
        target = torch.Tensor(target)
        target = Variable(target)
        optimizer.zero_grad()
        out = model(data)

        criterion = F.binary_cross_entropy
        loss = criterion(out, target)
        loss.backward()
        optimizer.step()
        count+=1
        print("Gen: {gene} Count: {counte} Percantage: {per}".format(gene=gen, counte=count, perc = (count/size)*100))
        

train(1)

Ok so first off the cnn is output the correct number of channels. The shape of the output is [batch_size, channels, width, height]. So the second number is the channels which is 18 so that is correct. Also is your forward function working because it does not look like it should. You are not flattening the image so you are passing in a four dimensional tensor while linear layers only take two dimensions. Your linear layers also does not have enough inputs to take an image of that size. To flatten the output you can do this

x = x.view(-1, x.shape[1] * x.shape[2] * x.shape[3])

If the output of your cnn is that big that means the input of ln1 should be 432. One last problem is you do not zero out the gradients in your training loop. After you do optimizier.step() you need to do this

optimizer.zero_grad()

Here is the pytorch classification tutorial if you want to look at it for some reference. You also need to change your loss to binary_cross_entropy_with_logits which includes a softmax.

1 Like

Thanks! Your answer helped me a lot!
optimizer.zero_grad() is called before out = model(data).

The x.view function gives me a [1,2] instead of a [2], but it actually doesnt matter.

The 432 for ln1 does not work, i put there the value of (x.shape[1] * x.shape[2] * x.shape[3]) which would be 2592 and this works.
The forward funktion does actually work, but i will look into it a quite a bit more.

Oh sorry I did not see the zero grad. Glad I could help. Also another small thing. It doesn’t really matter but sometimes it looks better to define the loss function before the loop and just use it in the loop like this:

criterion = nn.CrossEntropyLoss()

outside the for loop and then in the for loop you would only have to do this

loss = criterion(output,labels)
1 Like