Concatenating image tensor and numerical data not matching up


I’m trying to slowly build an image model that is concatenated with numerical data and categorical embedding layers. That said, I’m only on the image+numerical data step. Following this post: Concatenate layer output with additional input data, I’m getting an error in the dimensions.

Here is the error traceback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-255-8c805dc65c00> in <module>
---> 22         y_pred = combined_model(image, numerical_data)
     23         single_loss = loss_function(y_pred, label)
     24         aggregated_losses.append(single_loss)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\ in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

<ipython-input-250-945f067aedac> in forward(self, image, x_numerical)
     26         x1 = self.cnn(image)
     27         x2 = numerical_data
---> 28         x =, x2), dim = 1)
     29         x = F.relu(self.fc1(x))
     30         x = self.fc2(x)

RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2 at C:/w/1/s/windows/pytorch/aten/src\THC/generic/

Here is my model object:

class Image_Embedd(nn.Module):

    def __init__(self):
        embedding_size: Contains the embedding size for the categorical columns
        num_numerical_cols: Stores the total number of numerical columns
        output_size: The size of the output layer or the number of possible outputs.
        layers: List which contains number of neurons for all the layers.
        p: Dropout with the default value of 0.5
        super(Image_Embedd, self).__init__()
        self.cnn = models.resnet50(pretrained=False)
        self.cnn.fc = nn.Linear(self.cnn.fc.in_features, 256)
        self.cnn.fc1 = nn.Linear(256 + numerical_data.shape, 256 + numerical_data.shape)
        self.cnn.fc2 = nn.Linear(256 + numerical_data.shape, 2)
    #define the foward method
    def forward(self, image, x_numerical):
        x1 = self.cnn(image)
        x2 = numerical_data
        x =, x2), dim = 1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = F.log_softmax(x)
        return x

Here is my model run:

epochs = 1
aggregated_losses = []

max_trn_batch = 25

for i in range(epochs):
    for b, (image, label, policy, cateogrical_data, numerical_data
            , categorical_embedding_sizes) in enumerate(train_loader):
        image = image.cuda()
        label = label.cuda()
        numerical_data = numerical_data.cuda()
        #print(image, label, categorical_data, numerical_data)
        #count batches
        b += 1
        #throttle teh batches
        if b == max_trn_batch:

        y_pred = combined_model(image, numerical_data)
        single_loss = loss_function(y_pred, label)
        # statistics
        running_loss += single_loss.item() * image.size(0)
        running_corrects += torch.sum(y_pred ==

        print(f'train-epoch: {i}, train-batch: {b}')


When I run:

for image, label, policy, cateogrical_data, numerical_data, categorical_embedding_sizes in train_loader: 
    print(f"numeric size is {numerical_data.shape} \
          image size is {image.shape}")

I get:

numeric size is torch.Size([10, 110528, 8]) image size is torch.Size([10, 3, 224, 224])

I’m not sure how to properly concat those two sizes.

If there is anything else that will help, or if you see something glaring that I’m missing, I would appreciate the help. Thank you.

The output of the resnet will be [batch_size, 1000], while x_numerical will be [batch_size, 110528, 8], which is incompatible, since these tensors do not have the same number of dimensions.
How would you like to concatenate the 2-dimensional cnn output with the 3-dimensional numerical tensor?

Thank you for the reply.

For the numerical data, I figured out it was in the data loader/custom data set. I was pulling the every numerical value for every observation. By changing
numerical_data = self.image_frame.loc[numerical_columns] to numerical_data = self.image_frame.loc[idx, numerical_columns], it now concatenates just fine now and runs.

Now i need to figure out how to properly add categorical embeddings in the model object.