How to combine tabular and image data?

I am trying to build a classifier combining image data and tabular data. So far, I have a trained a resnet50 on rgb images to predict one of 783 classes.

In addition to the image dataset, I also have some tabular data (.csv file) which I would like to use. Namely, the country and zones (~continent) in which the images were taken. There are 188 countries and 8 zones. Imagine these are vectors already, e.g. one-hot encoded. How could I add this information to the fully connected layer of resnet50 and fine-tune the model?

I can draw the process in my head but I struggle to code it… Which should be my next steps?

I’m not sure, if you could add this new information to a linear layer in the resnet, as the shape might not fit and you might destroy the processing of the image features by retraining this layer.
You could generally pass these additional features through a new model and concatenate the penultimate activations of this new model and the resnet before feeding them to a final classifier.
This approach was done a couple of times by users in this board. Often the model tends to focus on only one feature branch and I’m not sure, if there is a good workaround besides normalizing the activation tensors to the same value range.

1 Like

interesting! thanks a lot @ptrblck could you please add for me a link to some code e.g. from this other users? I believe this could help me a lot get things more clear.

Here is a simple example.