Concatenate layer output with additional input data

(Raúl Gombru) #1

I want to build a CNN model that takes additional input data besides the image at a certain layer.
To do that, I plan to use a standard CNN model, take one of its last FC layers, concatenate it with the additional input data and add FC layers processing both inputs.


The code I need would be something like:

additional_data_dim = 100
output_classes = 2
model = models.__dict__['inception_v3']
# TODO: Concatenate the CNN layer with the additional data
model.fc1 = nn.Linear(2048 + additional_data_dim, 2048 + additional_data_dim)
model.fc2 = nn.Linear(2048 + additional_data_dim, output_classes)

How should I code that?



Here is a small example for your use case:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.cnn = models.inception_v3(pretrained=False, aux_logits=False)
        self.cnn.fc = nn.Linear(
            self.cnn.fc.in_features, 20)
        self.fc1 = nn.Linear(20 + 10, 60)
        self.fc2 = nn.Linear(60, 5)
    def forward(self, image, data):
        x1 = self.cnn(image)
        x2 = data
        x =, x2), dim=1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = MyModel()

batch_size = 2
image = torch.randn(batch_size, 3, 299, 299)
data = torch.randn(batch_size, 10)

output = model(image, data)

I chose random values for the linear layers, so you should use your constrains like additional_data_dim.

(Raúl Gombru) #3

Thank you! Understood, working on it!

(Abeer) #4

hello, if I would I have two models and I want to concatenate the last layer, so the last layer is shared. can I use the same approach ?


You would have to pass your input(s) through both models and concat the outputs before the final layer.

(Uk Jo) #6

Here why does it set dim as 1 in function? I think it would be 0?

(Justus Schock) #7

Dim 0 means that you treat the outputs as independent batch samples since you increase the batchsize. However, this is probably not what one would want in this case,since the results of both models belong to the same sample.

Using dim 1 concatenates in each samples channels which makes more sense since they are now belonging to the same sample and this approach can (unlike the first one) even be used if the model’s produce a different number of channels (although the rest of the sizes must be equal)

(Uk Jo) #8

Thanks for the reply. Dim 0 means batch dims same as keras. Your reply helped me a lot :slight_smile:

(Tunesh Verma) #9

I am getting error while concatenating two layer with different dimensions
torch.Size([1, 256, 13, 13])
torch.Size([1, 512, 26, 26])

RuntimeError Traceback (most recent call last)
in ()
----> 1 s,d = a(inp)

D:\Softwares\anacond33\lib\site-packages\torch\nn\modules\ in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

in forward(self, x, CUDA)
51 print(map2.shape)
–> 53 x =, map2), dim=1)
54 outputs[i] = x

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 13 and 26 in dimension 2 at c:\programdata\miniconda3\conda-bld\pytorch_1533096106539\work\aten\src\th\generic/THTensorMath.cpp:3616

(Rchavezj) #10

Why did you provided the code

self.cnn.fc = nn.Linear(self.cnn.fc.in_features, 20)

Because I don’t see the variable being used inside the forward function


self.cnn.fc is the “classifier layer” of the inception_v3 model.
I just replaced it with my own linear layer to change the number of output neurons.

The model is used in x1 = self.cnn(image). self.cnn.fc is thereby called inside the forward of the inception_v3 model: line of code.