Model throws error on input size when its last layer is removed

Ahmed_Abbas · July 13, 2017, 10:47am

Hi, I have the following model:

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*4*4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out

I want to load a pre-trained file for it, remove the last FC layer and use it as a feature extractor by:

sourceCNN = nn.Sequential(*list(sourceCNN.children())[:-1])

But now sourceCNN.forward throws the error:

RuntimeError: matrices expected, got 4D, 2D tensors at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:1232

Where, if I do not convert the model to sequential model it works perfectly. Now, I have seen some existing posts with these kind of errors on forum being solved with .view however, I do not know what exactly is going on and why do we need to call .view and what should be the arguments of it?

Thanks!

hughperkins · July 13, 2017, 10:53am

Hmmm, before thinking about conceputal things, is it sure that .children() will return the children in the same order that you added them as object attributes? (I’m not saying it isnt, just posing the question, not sure how such a guarantee would be implemented though in fact?)

hughperkins · July 13, 2017, 10:55am

also, yuour sequential would be missing .view, as you correctly point out

why do we need a view?

the convolution layers have tensors in layout something like (N,C,H,W), where:

N => batchsize
C => number of channels/features
H => height of each image
W => width of each image

The fc layers work on much more ‘squished’ dimensinos, something more ike (N, C):

N => batchsie
C => number of features/neurons

The view converts from one layout t o another

Ahmed_Abbas · July 13, 2017, 11:00am

So, the original model looks like this via debugger:

LeNet (
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (256 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)

After removing the last layer and converting to sequential (the first layer is still the same and the last one is removed):

Sequential (
  (0): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (1): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (2): Linear (256 -> 120)
  (3): Linear (120 -> 84)

Maybe there is a better way to remove the last layer instead of converting it to sequential, but thats what I was able to find on the forums.

So should I do a .view to the input I am passing to the model or should I do something with the model itself? Thanks!

hughperkins · July 13, 2017, 11:02am

Why not just leave the model as-is, and monkey patch in a new forward method?

def truncated_forward(self, x):
    out = F.relu(self.conv1(x))
    out = F.max_pool2d(out, 2)
    out = F.relu(self.conv2(out))
    out = F.max_pool2d(out, 2)
    out = out.view(out.size(0), -1)
    out = F.relu(self.fc1(out))
    out = F.relu(self.fc2(out))
    return out

model.forward = truncated_forward

I havent tried this, but see no obvious reason why this wouldnt work?

hughperkins · July 13, 2017, 11:04am

(for that matter, are you loading this from Pickle? In that case, the pickle only stores the data, doesnt store the actual method implementaitons. So, if you modify the LeNet class, to remove teh fc3 bit, only from forward, I think all should work ok?)

Ahmed_Abbas · July 13, 2017, 11:10am

Nice “monkey-patch”, it is working now. Thanks!

hughperkins · July 13, 2017, 11:12am

Cool. “monkey patch” is an actual ‘thing’ by the way, https://en.wikipedia.org/wiki/Monkey_patch

It’s heavily used/useable in python. It wont really work in c++. In Python, pretty much anything can be modified/monkey-patched at runtime, which is kind of nice (though dangerous…)

dgriff · July 14, 2017, 4:15am

Just wanted to add a tip on your model for efficiency. In the particular case of using Relu and maxpool you can actually reorder as doing the maxpool2D on conv2d output and then apply the Relu activation. Reason being that order of operation doesn’t matter in this particular case as they mathematically compute to be equal. So in forward def put order of operations as:

out =F.relu(F.max_pool2d(self.conv1(x), 2))

When done in this order it requires a substantial fewer number of computations and obviously speeds training up as benefit while result is still the same