Design a neural network

Hi. I’m new to deep learing. I, according to an example adapted to the MNIST, try something new. I import kaggle kernel Images of Lego Bricks. But there is a different size of the pictures. And I get an error.

shape ‘[-1, 256]’ is invalid for input of size 141376

I changed 16 * 4 *4 to 16 * 94 *94, but got new error

Expected input batch_size (1) to match target batch_size (4).

I changed batch_size to 1 and got new one

shape ‘[-1, 141376]’ is invalid for input of size 35344

It seems to me that the neural network is not correct. Please tell me how to design a neural network for this case. Thanks.

P.S. If you have an article explaining how to do this, give a link to it.

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.optim as optim
import numpy as np
import os
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 *4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

classes = os.listdir("F:\\projects\\conda\\datasets\\lego-brick-images\\train")
batch_size = 4
epochs=30
lr=0.01
log_interval=1000

train_dataset = torchvision.datasets.ImageFolder(root="F:\\projects\\conda\\datasets\\lego-brick-images\\train", 
                                                 transform=transforms.ToTensor())
test_dataset = torchvision.datasets.ImageFolder(root="F:\\projects\\conda\\datasets\\lego-brick-images\\valid", 
                                                transform=transforms.ToTensor())

trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

model = Net();
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)  

def train(model, train_loader, optimizer, epoch, log_interval):
    model.train()
    avg_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad() 
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        avg_loss+=F.nll_loss(output, target, reduction='sum').item()
        
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
    avg_loss/=len(train_loader.dataset)
    return avg_loss

def test(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    accuracy = 100. * correct / len(test_loader.dataset)
    return test_loss,accuracy

train_losses = []
test_losses = []
accuracy_list = []
for epoch in range(1, epochs + 1):
    trn_loss = train(model, trainloader, optimizer, epoch, log_interval)
    test_loss, accuracy = test(model, testloader)
    train_losses.append(trn_loss)
    test_losses.append(test_loss)
    accuracy_list.append(accuracy)

You need to keep detailed track of what shape the image is after being passed through convolution and pooling layers. In this case after conv+pool+conv+pool the spatial size is 47x47. So your self.fc1 layer needs to have an input size of 16 * 47 * 47.

Input size: 200x200
After one 5x5 conv2d: 196x196
After a 2x2 pool2d: 98x98
After another 5x5 conv2d: 94x94
After the final 2x2 pool2d: 47x47

It’s easier to keep track of if you set the conv2d layers to have a padding size of 1 (for 3x3) or 2 (for 5x5), that way the only thing that changes the spatial extent of the image is the pooling.

Thanks! It worked.

I don’t know what the problem is. Neural network is not learning.
Could you see again what could be the problem?

Train Epoch: 1 [0/6379 (0%)] Loss: -0.002382

Test set: Average loss: nan, Accuracy: 400/6379 (6%)

Train Epoch: 2 [0/6379 (0%)] Loss: nan

Test set: Average loss: nan, Accuracy: 400/6379 (6%)

It looks like you’re using F.nll_loss() but you’re returning raw unnormalized scores from your model. You either need to:

(a) use a cross entropy loss (either the one you’re assigning to criterion and never using again, or F.cross_entropy()); or

(b) continue to use the NLL loss but use nn.LogSoftmax() (or F.log_softmax()) as the last layer of your model. The NLL loss expects log probabilities as its input so you have to transform your unnormalized scores with a (log) softmax.