Predicting the category of the text based on the category of the image that pertains to it

I have a CSV file where each line refers to an image that has 16 categories and each line also refers to some texts that has 9 categories. By modifying the PyTorch transfer learning tutorial to my needs, I am able to predict the classes of test images for each of the 16 categories. I wonder how can I predict if the text in each of the CSV lines that relates to one image each, belongs to any of these 9 text categories based on the category of the image? I am not sure what I could do or how to do it (I am also not sure if there is a technical term for this). Any guidance is really appreciated.