How to use nn.ModuleList() correctly

isalirezag · March 12, 2019, 12:35am

I have been reading most of the questions regarding the nn.ModuleList() and I thought I understood how to use it. But, apparently, I am missing something here.

I am creating a network based on two nn.ModuleList() and use one after another, then i want to see if it is learning anything, so based on the pytorch tutorial I tried it on CIFA10 based on the steps that is provided in Training a Classifier. every step is the same except my netweok, but the network does not learn anything, so i assume the issue is the way that im using the nn.ModuleList().

I am not sure where/what im doing wrong, everything looks okay.
Can someone please tell me why it is not working and what im doing wrong here

here is network



class Prediction(nn.Module):
    
    def __init__(self):
        super(Prediction, self).__init__()
        self.base = self.VGG16()
        self.prediction = self.Extention()
        
    
    def forward(self, x):

        
        for i, name in enumerate(self.base):
            x = self.base[i](x)
            
        # Copy x to x22
        x22 = x.clone() 
        
        S2 = x22.view(x22.size(0),512)
        S2 = F.relu(self.prediction[0](S2))
        S2 = F.relu(self.prediction[1](S2))
        S2 = self.prediction[2](S2)
        
        return S2
    
    def VGG16(self):
        
        cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']

        layers = nn.ModuleList()
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers.append(nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True))
            else:
                layers.append(nn.Conv2d(in_channels, x, kernel_size=3, padding=1))
                layers.append(nn.ReLU(True))
                in_channels = x
        return layers
    
    
       
    def Extention(self):
        
        prediction = nn.ModuleList()
        
        FC1 = nn.Linear(512, 120)
        FC2 = nn.Linear(120, 84)
        FC3 = nn.Linear(84, 10)
        
        prediction.append(FC1) 
        prediction.append(FC2) 
        prediction.append(FC3) 
        
        return prediction
    
    
net = Prediction()

the self.base is the VGG16 without the classifier layer.
and the self.Extension is the classifier.

Here is also the whole code:

import torchvision
import torchvision.transforms as transforms


bb = 4
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=bb,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=bb,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()



import torch.nn as nn
import torch.nn.functional as F



class Prediction(nn.Module):
    
    def __init__(self):
        super(Prediction, self).__init__()
        self.base = self.VGG16()
        self.prediction = self.Extention()
        
    
    def forward(self, x):

        
        for i, name in enumerate(self.base):
            x = self.base[i](x)
            
        # Copy x to x22
        x22 = x.clone()
        
        S2 = x22.view(x22.size(0),512)
        S2 = F.relu(self.prediction[0](S2))
        S2 = F.relu(self.prediction[1](S2))
        S2 = self.prediction[2](S2)
        
        return S2
    
    def VGG16(self):
        
        cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']

        layers = nn.ModuleList()
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers.append(nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True))
            else:
                layers.append(nn.Conv2d(in_channels, x, kernel_size=3, padding=1))
                layers.append(nn.ReLU(True))
                in_channels = x
        return layers
    
    
       
    def Extention(self):
        
        prediction = nn.ModuleList()
        
        FC1 = nn.Linear(512, 120)
        FC2 = nn.Linear(120, 84)
        FC3 = nn.Linear(84, 10)
        
        prediction.append(FC1) 
        prediction.append(FC2) 
        prediction.append(FC3) 
        
        return prediction
    
    
net = Prediction()



import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 200 == 199:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

the output is:

Files already downloaded and verified
Files already downloaded and verified
[1,   200] loss: 0.230
[1,   400] loss: 0.230
[1,   600] loss: 0.230
[1,   800] loss: 0.230
[1,  1000] loss: 0.230
[1,  1200] loss: 0.230
[1,  1400] loss: 0.230
[1,  1600] loss: 0.230
[1,  1800] loss: 0.230
[1,  2000] loss: 0.230
[1,  2200] loss: 0.230
[1,  2400] loss: 0.230
[1,  2600] loss: 0.230
[1,  2800] loss: 0.230
[1,  3000] loss: 0.230
[1,  3200] loss: 0.230
[1,  3400] loss: 0.230
[1,  3600] loss: 0.230
[1,  3800] loss: 0.230
[1,  4000] loss: 0.230
[1,  4200] loss: 0.230
[1,  4400] loss: 0.230
[1,  4600] loss: 0.230

so as we can see no loss is decreasing

MariosOreo · March 12, 2019, 4:16am

Hi,

In my shallow view, the problem do not exist in nn.MuduleList().
It occurs when you use .clone(), I think it creates a tensor with requires_grad=False so that gradients do not flow in backward.

So you can remove x22=x.clone() and feed viewed x to the classifier.

S2 = x.view(x.size(0), 512)

isalirezag · March 12, 2019, 2:27pm

Thank you, but I dont think your answer is relevant.
.clone() should not has any effect on this issue.
I also checked the situation that S2 = x.view(x.size(0), 512), the results are still the same and loss does not decreases.

I also tried diffeent lr : 0.1 and 0.0001. still no changes

cuongpx · December 30, 2019, 8:20am

There is no connection between each of the Module that you stores in ModuleList because ModuleList is just like a Python list.

You should check out this thread for more detail

muammar · April 30, 2020, 11:19pm

I thought it was more than a simple python list according to this post and its reply: When should I use nn.ModuleList and when should I use nn.Sequential?