How to predict the matrix before the last layer?

namduc · March 31, 2020, 2:44pm

I have trained model with layers stacks in nn.Sequential for classification problem.
The ConvNet architecture look like this:

class ConvNet(nn.Module):
    def __init__(self,num_classes=8):
        super(ConvNet,self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1,64,kernel_size=7),                                           
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2)                                 
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(64,128,kernel_size=7,stride=2),                               
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2)                                     
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(128,256,kernel_size=3),                                        
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2)                                 
        )
        self.layer4 = nn.Sequential(                                                
            nn.Conv2d(256,512,kernel_size=3),                                     
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2),                                   
            nn.BatchNorm2d(512)                                     
        )
        self.hidden = nn.Linear(2*2*512,1024)
        self.drop = nn.Dropout(0.6)
        self.dense1 = nn.Sequential(
            nn.Linear(1024,256),
            nn.ReLU(),
            nn.Dropout(0.25)
        )
        self.dense2 = nn.Sequential(
            nn.Linear(256,64),
            nn.ReLU()
        )
        self.fc1 = nn.Linear(64,32)
        self.fc = nn.Linear(32,num_classes)

    def forward(self,x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = out.reshape(out.size(0),-1)
        out = F.relu(self.hidden(out))
        out = self.drop(out)
        out = self.dense1(out)
        out = self.dense2(out)
        out = F.relu(self.fc1(out))
        out = self.fc(out)
        return out

ConvNet(
(layer1): Sequential(
(0): Conv2d(1, 64, kernel_size=(7, 7), stride=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
)
(layer2): Sequential(
(0): Conv2d(64, 128, kernel_size=(7, 7), stride=(2, 2))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
)
(layer3): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
)
(layer4): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(hidden): Linear(in_features=2048, out_features=1024, bias=True)
(drop): Dropout(p=0.6, inplace=False)
(dense1): Sequential(
(0): Linear(in_features=1024, out_features=256, bias=True)
(1): ReLU()
(2): Dropout(p=0.25, inplace=False)
)
(dense2): Sequential(
(0): Linear(in_features=256, out_features=64, bias=True)
(1): ReLU()
)
(fc1): Linear(in_features=64, out_features=32, bias=True)
(fc): Linear(in_features=32, out_features=8, bias=True)
)

Then, I have used this method delete the last layer in order to obtain the matrix 32:

model = ConvNet(8).to(device)
model.load_state_dict(torch.load('model.pt'))
removed = list(model.children())[:-1]
new_model= torch.nn.Sequential(*removed)
print(new_model)

Output:
Sequential(
(0): Sequential(
(0): Conv2d(1, 64, kernel_size=(7, 7), stride=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
)
(1): Sequential(
(0): Conv2d(64, 128, kernel_size=(7, 7), stride=(2, 2))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
)
(2): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
)
(3): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): AvgPool2d(kernel_size=2, stride=2, padding=0)
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): Linear(in_features=2048, out_features=1024, bias=True)
(5): Dropout(p=0.6, inplace=False)
(6): Sequential(
(0): Linear(in_features=1024, out_features=256, bias=True)
(1): ReLU()
(2): Dropout(p=0.25, inplace=False)
)
(7): Sequential(
(0): Linear(in_features=256, out_features=64, bias=True)
(1): ReLU()
)
(8): Linear(in_features=64, out_features=32, bias=True)
)
However, when I want to predict new picture to get matrix 32, I get an error:

RuntimeError: size mismatch, m1: [1024 x 2], m2: [2048 x 1024] at C:/w/1/s/tmp_conda_3.8_075429/conda/conda-bld/pytorch_1579852542185/work/aten/src\THC/generic/THCTensorMathBlas.cu:290

Something was wrong here?
How can i get the maxtrix 32 before last layer classifies?

ptrblck · April 1, 2020, 6:10am

I assume you’ve tried to create the new model by wrapping the child modules into an nn.Sequential container.
If that’s the case, note that you will lose all functional API calls from the forward method in your original model, e.g. out = out.reshape(out.size(0),-1) as well as the F.relu calls.
Thus you should add them via e.g. nn.Flatten, nn.ReLU.

Alternatively, you could manipulate the forward method or use forward hooks to get the desired activation.

namduc · April 1, 2020, 2:36pm

Can I use this method for replace out = out.reshape(out.size(0),-1) ?

class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

Then, forward function look like this:

def forward(self,x):
    out = self.layer1(x)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = Flatten(out)
    out = nn.ReLU(self.hidden(out))
    out = self.drop(out)
    out = self.dense1(out)
    out = self.dense2(out)
    out = nn.ReLU(self.fc1(out))
    out = self.fc(out)
    return out

Is this right ?

ptrblck · April 2, 2020, 12:25am

Generally yes, but you would have to create instances of these layers or use the functional API in your forward method:

def forward(self,x):
    out = self.layer1(x)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = Flatten()(out)
    out = nn.ReLU()(self.hidden(out))
    out = self.drop(out)
    out = self.dense1(out)
    out = self.dense2(out)
    out = nn.ReLU()(self.fc1(out))
    out = self.fc(out)
    return out

namduc · April 3, 2020, 3:46am

I have followed your forward function, but I still get the same error.
I called all parameter in file test.py like this:

import torch
import torch.nn as nn

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

class ConvNet(nn.Module):
    def __init__(self,num_classes=8):
        super(ConvNet,self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1,64,kernel_size=7),                                          
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2)                                     
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(64,128,kernel_size=7,stride=2),                               
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2)                                    
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(128,256,kernel_size=3),                                        
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2)                                    
        )
        self.layer4 = nn.Sequential(                                                
            nn.Conv2d(256,512,kernel_size=3),                                        
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2,stride=2),                                   
            nn.BatchNorm2d(512)                                     
        )
        self.hidden = nn.Linear(2*2*512,1024)
        self.drop = nn.Dropout(0.6)
        self.dense1 = nn.Sequential(
            nn.Linear(1024,256),
            nn.ReLU(),
            nn.Dropout(0.25)
        )
        self.dense2 = nn.Sequential(
            nn.Linear(256,64),
            nn.ReLU()
        )
        self.fc1 = nn.Linear(64,32)
        self.fc = nn.Linear(32,num_classes)

    def forward(self,x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = Flatten()(out)
        out = nn.ReLU()(self.hidden(out))
        out = self.drop(out)
        out = self.dense1(out)
        out = self.dense2(out)
        out = nn.ReLU()(self.fc1(out))
        out = self.fc(out)
        return out

model = ConvNet(8).to(device)
model.load_state_dict(torch.load('model.pt'))
removed = list(model.children())[:-1]
new_model= torch.nn.Sequential(*removed)
with torch.no_grad():
     image_trans = torch.from_numpy(test_image.astype(np.float32)).to(device)
     prediction = new_model(image_trans)

Error:

Traceback (most recent call last):
  File "test.py", line 121, in <module>
    prediction = new_model(image_trans)
  File "C:\ProgramData\Anaconda3\envs\deep\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\deep\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "C:\ProgramData\Anaconda3\envs\deep\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)*
  File "C:\ProgramData\Anaconda3\envs\deep\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\ProgramData\Anaconda3\envs\deep\lib\site-packages\torch\nn\functional.py", line 1372, in linear
    output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [1024 x 2], m2: [2048 x 1024] at C:/w/1/s/tmp_conda_3.8_075429/conda/conda-bld/pytorch_1579852542185/work/aten/src\THC/generic/THCTensorMathBlas.cu:290

ptrblck · April 3, 2020, 5:51am

Sorry for being not clear enough.

You won’t be able to copy the child modules directly to an nn.Sequential container, as all functional calls as well as locally defined modules in the original forward function will be missing.

Also, removed contains the order of modules as they were created in the __init__.
If this order is wrong, also the order of modules in the sequential container will be wrong.

To add the Flatten and ReLU modules into a sequential container, you could use an approach similar to this one:

new_model= torch.nn.Sequential(
    *(list(removed[:4]) + [nn.Flatten(), nn.ReLU()] + list(removed[5:7]) + [nn.ReLU()] + list(removed[-2:-1])))

Note, that the order is currently wrong and the model won’t work.

nn.Sequential is used for simple models and as you can see, manipulating the forward method is easier by writing a custom module and defining the forward manually.

namduc · April 3, 2020, 9:11am

Thanks, i understand.
One more question, if forward function look like this:

   def forward(self,x):
        out = self.layer1(x)
        out = nn.Flatten()(out)

And sequential container look like this:

nn.Sequential(layer[0] + [nn.Flatten()])

or like this:

nn.Sequential( [nn.Flatten()] + layer[0])

Which one is right ?
I mean, I have problems with the order of using the functional APIs in nn.Sequential

ptrblck · April 3, 2020, 8:25pm

For a single layer, this approach would work.
However, note that model.children() returns the child modules in the order they were initialized in the __init__, which might not be the same order they are called in the forward.

namduc · April 4, 2020, 2:06pm

I have a different question
I trained the model with batch_size = 8
However I want to test with one single image on the model obtained.
How do i achieve this without having to retrain the model with batch_size = 1?

ptrblck · April 5, 2020, 2:30am

You don’t have to retrain the model, but just set it to evaluation mode via model.eval().
This will make sure to e.g. disable dropout and use the estimated stats in batchnorm layers instead of the batch stats.