Hello,
I’m trying to slowly build an image model that is concatenated with numerical data and categorical embedding layers. That said, I’m only on the image+numerical data
step. Following this post: Concatenate layer output with additional input data, I’m getting an error in the dimensions.
Here is the error traceback:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-255-8c805dc65c00> in <module>
20
21
---> 22 y_pred = combined_model(image, numerical_data)
23 single_loss = loss_function(y_pred, label)
24 aggregated_losses.append(single_loss)
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
<ipython-input-250-945f067aedac> in forward(self, image, x_numerical)
26 x1 = self.cnn(image)
27 x2 = numerical_data
---> 28 x = torch.cat((x1, x2), dim = 1)
29 x = F.relu(self.fc1(x))
30 x = self.fc2(x)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2 at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorMath.cu:62
Here is my model object:
class Image_Embedd(nn.Module):
def __init__(self):
'''
Args
---------------------------
embedding_size: Contains the embedding size for the categorical columns
num_numerical_cols: Stores the total number of numerical columns
output_size: The size of the output layer or the number of possible outputs.
layers: List which contains number of neurons for all the layers.
p: Dropout with the default value of 0.5
'''
super(Image_Embedd, self).__init__()
self.cnn = models.resnet50(pretrained=False)
self.cnn.fc = nn.Linear(self.cnn.fc.in_features, 256)
self.cnn.fc1 = nn.Linear(256 + numerical_data.shape, 256 + numerical_data.shape)
self.cnn.fc2 = nn.Linear(256 + numerical_data.shape, 2)
#define the foward method
def forward(self, image, x_numerical):
x1 = self.cnn(image)
x2 = numerical_data
x = torch.cat((x1, x2), dim = 1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x = F.log_softmax(x)
return x
Here is my model run:
epochs = 1
aggregated_losses = []
max_trn_batch = 25
for i in range(epochs):
for b, (image, label, policy, cateogrical_data, numerical_data
, categorical_embedding_sizes) in enumerate(train_loader):
image = image.cuda()
label = label.cuda()
numerical_data = numerical_data.cuda()
#print(image, label, categorical_data, numerical_data)
#count batches
b += 1
#throttle teh batches
if b == max_trn_batch:
break
y_pred = combined_model(image, numerical_data)
single_loss = loss_function(y_pred, label)
aggregated_losses.append(single_loss)
# statistics
running_loss += single_loss.item() * image.size(0)
running_corrects += torch.sum(y_pred == label.data)
print(f'train-epoch: {i}, train-batch: {b}')
optimizer.zero_grad()
single_loss.backward()
optimizer.step()
When I run:
for image, label, policy, cateogrical_data, numerical_data, categorical_embedding_sizes in train_loader:
print(f"numeric size is {numerical_data.shape} \
image size is {image.shape}")
break
I get:
numeric size is torch.Size([10, 110528, 8]) image size is torch.Size([10, 3, 224, 224])
I’m not sure how to properly concat those two sizes.
If there is anything else that will help, or if you see something glaring that I’m missing, I would appreciate the help. Thank you.