Continuous label for transfer learning

dp_unil · November 27, 2023, 9:01pm

Hi!

tl;dr = How can I set up a dataset that contains multiple continuous target variables that I want to predict upon one-by-one?

I am trying to predict continuous biological traits (e.g., age, bmi, blood pressure, etc.) from retinal fungus images. I have a pretrained transformer model that I would like to fine tune over many of these continuous labels. The dataset I am using to train, test, and validate on is from ~200,000 individuals. One of the issues I am running into is that it looks like: 1) most of the tutorials I find online are for classifying based on binary or small multi-class labels; or 2) the recommended method of creating a dataset is to save the train/test and control/condition into multiple folders. As I am looking at continuous targets and am using a dataset that is more than 0.5TB, I was thinking I should try to generate a pipeline that would digest a tabular dataframe that contains columns that represent the continuous target variables (e.g., age, bmi, blood pressure, etc.), a column that indicates whether the data is test/train data, and a column that holds the information regarding the path to the photo. Regardless if this tabular metadata is a good way to go about setting up a custom dataset for PyTorch transfer learning, does anyone have a good way to organize their custom image-based datasets to predict continuous target variables? Thank you in advance for any help