RuntimeError: size mismatch in densenet161

Uchchwas · September 16, 2020, 5:19pm

I tried many time. but non of them effect this error. My code perfectly run for vgg16 and resnet. But when i tried densenet this error happen.
Here is the transform code:

    # transform
    train_transform = transforms.Compose([
                    transforms.Resize((224, 224)),
                    transforms.RandomHorizontalFlip(),
                    transforms.ToTensor(), 
                    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])])

    val_transform = transforms.Compose([
                    transforms.Resize((224, 224)),
                    transforms.ToTensor(), 
                    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])])

        densenet161 = getattr(models, dictionary['cnn_model'])(pretrained=True)

        #remove the last fc layer
        self.densenet161 = nn.Sequential(*list(densenet161.children())[:-1])

        for param in self.densenet161.parameters():
             param.requires_grad = False
        self.encoder_linear = nn.Linear(densenet161.classifier.in_features, dictionary['embed_size'])
        self.bn = nn.BatchNorm1d(dictionary['embed_size'])

        self.embedding = nn.Embedding(dictionary['vocab_size'], dictionary['embed_size'])
        self.rnn = getattr(nn, dictionary['rnn_model'])(dictionary['embed_size'], dictionary['hidden_size'], dictionary['num_layers'], batch_first=True)
        self.dropout = nn.Dropout(dictionary['dropout'])
        self.decoder_linear = nn.Linear(dictionary['hidden_size'], dictionary['vocab_size'])
        self.init_weights()
        

    def init_weights(self):
        """
        randomly initilize weights for linear layers
        """
        init_value = 0.1
        self.embedding.weight.data.uniform_(-init_value, init_value)
        self.encoder_linear.weight.data.uniform_(-init_value, init_value)
        self.decoder_linear.weight.data.uniform_(-init_value,init_value)

    def forward(self, images, captions, lengths):
        ''' Extract features from input and pass output to LSTM'''
        ### ENCODER
        with torch.no_grad():
             #features = self.resnet(images)
             #features = self.vgg(images)
             #features = self.inception_v3(images)
             features = self.densenet161(images)

        features = features.reshape(features.shape[0], -1)
        features = self.bn(self.encoder_linear(features))

        ### DECODER
        embeddings = self.embedding(captions) # (batch_size, )
        # concat features and embeddings
        embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
        
        packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
        
        hiddens, _ = self.rnn(packed)

        outputs = self.dropout(hiddens[0])
        outputs = self.decoder_linear(outputs)


        return outputs

And the error is:

RuntimeError: size mismatch, m1: [8 x 108192], m2: [2208 x 512] at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THC/generic/THCTensorMathBlas.cu:290

Thank you for the help, I am still very new to pytorch.

ptrblck · September 18, 2020, 4:01am

Based on the error message I guess the output of self.rnn might be too large for the following linear layer.
Unfortunately, the shapes are undefined in your code so that I cannot debug it.
Could you check the shape of hiddens and make sure it matches the expected shape of self.decoder_linear?