I'm not getting the correct output from model

Hello. I trained an autoencoder/decoder and saved the model. I loaded the model and took the decoder portion off, in order to extract the features from the middle of the encoder. But when i run the encoder, it states the following error;

AttributeError: ‘tuple’ object has no attribute ‘dim’

I thought tensors flowed through the entire object but maybe not. How do I get tensors of features out of this encoder? Code below:

class DenoisingAutoencoder(nn.Module):

def __init__(self):

    super(DenoisingAutoencoder, self).__init__()
                                                        # 32 x 32 x 3 (input)

    self.conv1e = nn.Conv2d(3, 24, 3, padding=2)        # 30 x 30 x 24
    self.conv2e = nn.Conv2d(24, 48, 3, padding=2)       # 28 x 28 x 48
    self.conv3e = nn.Conv2d(48, 96, 3, padding=2)       # 26 x 26 x 96
    self.conv4e = nn.Conv2d(96, 128, 3, padding=2)      # 24 x 24 x 128
    self.conv5e = nn.Conv2d(128, 256, 3, padding=2)     # 22 x 22 x 256
    self.mp1e   = nn.MaxPool2d(2, return_indices=True)  # 11 x 11 x 256

    self.mp1d = nn.MaxUnpool2d(2)
    self.conv5d = nn.ConvTranspose2d(256, 128, 3, padding=2)
    self.conv4d = nn.ConvTranspose2d(128, 96, 3, padding=2)
    self.conv3d = nn.ConvTranspose2d(96, 48, 3, padding=2)
    self.conv2d = nn.ConvTranspose2d(48, 24, 3, padding=2)
    self.conv1d = nn.ConvTranspose2d(24, 3, 3, padding=2)
    

def forward(self, x):
    # Encoder
    x = self.conv1e(x)
    x = F.relu(x)
    x = self.conv2e(x)
    x = F.relu(x)
    x = self.conv3e(x)
    x = F.relu(x)
    x = self.conv4e(x)
    x = F.relu(x)
    x = self.conv5e(x)
    x = F.relu(x)
    x, i = self.mp1e(x)
    
     # Decoder
    x = self.mp1d(x, i)
    x = self.conv5d(x)
    x = F.relu(x)
    x = self.conv4d(x)
    x = F.relu(x)
    x = self.conv3d(x)
    x = F.relu(x)
    x = self.conv2d(x)
    x = F.relu(x)
    x = self.conv1d(x)
    x = F.relu(x)
    
    return x

Then I bring it back and take off the decoder:

model = DenoisingAutoencoder()
model.load_state_dict(torch.load( “C:\Users\jordan.howell\Pytorch\UW_files\roof_autoencoder.pt”))
#pop off the decoder
new_model = nn.Sequential(*list(model.children())[:-6])
model = nn.Sequential(*new_model)
model.feature_vec = nn.Sequential(nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, 128))
model.cuda()
model.eval()

I’m not sure what I’m doing wrong.

Hi,

Would you have the exact stack trace of where this error comes from?


AttributeError Traceback (most recent call last)
in
44
45 with torch.no_grad():
—> 46 output = model(image)
47 test_image_output = output.data.cpu().numpy()
48 for i in test_image_output:

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
545 result = self._slow_forward(*input, **kwargs)
546 else:
→ 547 result = self.forward(*input, **kwargs)
548 for hook in self._forward_hooks.values():
549 hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input
94

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
545 result = self._slow_forward(*input, **kwargs)
546 else:
→ 547 result = self.forward(*input, **kwargs)
548 for hook in self._forward_hooks.values():
549 hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input
94

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
545 result = self._slow_forward(*input, **kwargs)
546 else:
→ 547 result = self.forward(*input, **kwargs)
548 for hook in self._forward_hooks.values():
549 hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py in forward(self, input)
85
86 def forward(self, input):
—> 87 return F.linear(input, self.weight, self.bias)
88
89 def extra_repr(self):

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py in linear(input, weight, bias)
1365 - Output: :math:(N, *, out\_features)
1366 “”"
→ 1367 if input.dim() == 2 and bias is not None:
1368 # fused op is marginally faster
1369 ret = torch.addmm(bias, input, weight.t())

The problem is that the last layer of your encoder is a max pooling.
In your custom forward, you do: x, i = self.mp1e(x) to get on one side the output and on the other the indices.
But with the Sequential module, then you give the tuple of these two outputs to the Linear layer. Which is unexpected.
Also you might need a reshape before the Linear to go from the 4D inputs of convolutions to the 2D input of a Linear layer.

Thank you @albanD. I took this notebook from a Udemy course. How would I flatten correctly, or is there a place in the docs to figure that out? Also, should I not pass the indicies to the unpooling layer?

Didn’t you just said that you removed the encoder? So you removed the unpooling layer no?

You can see examples of this reshaping in any cnn example, this one for example.

Hi @albanD. I’m still having trouble with the tensor.view command. I thought I had it. If I bring in the resnet-50 and pop off the last linear layer, that leaves the last layer as an adaptive average pooling layer. With batches of 10 images. I’m getting a tensor size of (10, 2048, 1, 1,). I understand the “10” means 10 images and the 2048 means 2048 features. If I want a feature set of 1000 flat features, how do I reshape this?

I tried the following:

CNNmodel = models.resnet50(pretrained = True)
for param in CNNmodel.parameters():
param.requires_grad = False
model_change = nn.Sequential(*list(CNNmodel.children())[:-1])
CNNmodel = nn.Sequential(*model_change)
CNNmodel = nn.view(-1, 1, 1000)
CNNmodel = CNNmodel.cuda()
CNNmodel

That gave an error that nn does not have a module ‘view’. Not sure how to add a flatten layer on this.

Hi,

(10, 2048, 1, 1) means that you have a batch of 10 images. with a 2048 channels and height and width of size 1.
It is a very weird image , but this is the output you get from the last convolutions/spatial pooling layers.

To now go to Linear layers, you want a 2D Tensor of shape (batch_size x nb_features). In this case, You want: out.view(-1, 2048) in your case.
If you want 1000 outputs in the end, you need to add other Linear layers that will do this transformation (nn.Linear(2048, 1000) for example).

So the original resnet-50’s last linear layer is (2048, 1000). so i guess if i want that vector of the image, I can leave it, no? Run it without Or add a out.view(-1)?

Yes you want to keep it if you need an output of size 1000.
You don’t want to do out.view(-1). Because it would mix the data from the different elements in the batch.