Concatenating image tensor and numerical data not matching up

Jordan_Howell · December 26, 2019, 3:56pm

Hello,

I’m trying to slowly build an image model that is concatenated with numerical data and categorical embedding layers. That said, I’m only on the image+numerical data step. Following this post: Concatenate layer output with additional input data, I’m getting an error in the dimensions.

Here is the error traceback:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-255-8c805dc65c00> in <module>
     20 
     21 
---> 22         y_pred = combined_model(image, numerical_data)
     23         single_loss = loss_function(y_pred, label)
     24         aggregated_losses.append(single_loss)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

<ipython-input-250-945f067aedac> in forward(self, image, x_numerical)
     26         x1 = self.cnn(image)
     27         x2 = numerical_data
---> 28         x = torch.cat((x1, x2), dim = 1)
     29         x = F.relu(self.fc1(x))
     30         x = self.fc2(x)

RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2 at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorMath.cu:62

Here is my model object:

class Image_Embedd(nn.Module):

    def __init__(self):
        '''
        Args
        ---------------------------
        embedding_size: Contains the embedding size for the categorical columns
        num_numerical_cols: Stores the total number of numerical columns
        output_size: The size of the output layer or the number of possible outputs.
        layers: List which contains number of neurons for all the layers.
        p: Dropout with the default value of 0.5
        
        '''
        super(Image_Embedd, self).__init__()
        
        self.cnn = models.resnet50(pretrained=False)
        
        self.cnn.fc = nn.Linear(self.cnn.fc.in_features, 256)
        self.cnn.fc1 = nn.Linear(256 + numerical_data.shape, 256 + numerical_data.shape)
        self.cnn.fc2 = nn.Linear(256 + numerical_data.shape, 2)
        
        
    #define the foward method
    def forward(self, image, x_numerical):
        
        x1 = self.cnn(image)
        x2 = numerical_data
        x = torch.cat((x1, x2), dim = 1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = F.log_softmax(x)
        return x

Here is my model run:

epochs = 1
aggregated_losses = []

max_trn_batch = 25

for i in range(epochs):
    for b, (image, label, policy, cateogrical_data, numerical_data
            , categorical_embedding_sizes) in enumerate(train_loader):
        image = image.cuda()
        label = label.cuda()
        numerical_data = numerical_data.cuda()
        #print(image, label, categorical_data, numerical_data)
        
        #count batches
        b += 1
        
        #throttle teh batches
        if b == max_trn_batch:
            break
        

        y_pred = combined_model(image, numerical_data)
        single_loss = loss_function(y_pred, label)
        aggregated_losses.append(single_loss)
        
        # statistics
        running_loss += single_loss.item() * image.size(0)
        running_corrects += torch.sum(y_pred == label.data)
        


        print(f'train-epoch: {i}, train-batch: {b}')

        optimizer.zero_grad()
        single_loss.backward()
        optimizer.step()

When I run:

for image, label, policy, cateogrical_data, numerical_data, categorical_embedding_sizes in train_loader: 
    print(f"numeric size is {numerical_data.shape} \
          image size is {image.shape}")
    break

I get:

numeric size is torch.Size([10, 110528, 8]) image size is torch.Size([10, 3, 224, 224])

I’m not sure how to properly concat those two sizes.

If there is anything else that will help, or if you see something glaring that I’m missing, I would appreciate the help. Thank you.

ptrblck · December 27, 2019, 3:47am

The output of the resnet will be [batch_size, 1000], while x_numerical will be [batch_size, 110528, 8], which is incompatible, since these tensors do not have the same number of dimensions.
How would you like to concatenate the 2-dimensional cnn output with the 3-dimensional numerical tensor?

Jordan_Howell · December 27, 2019, 11:39am

Thank you for the reply.

For the numerical data, I figured out it was in the data loader/custom data set. I was pulling the every numerical value for every observation. By changing
numerical_data = self.image_frame.loc[numerical_columns] to numerical_data = self.image_frame.loc[idx, numerical_columns], it now concatenates just fine now and runs.

Now i need to figure out how to properly add categorical embeddings in the model object.