Predicting float values based off image input

My goal is as above, to train a model to take images as input and predict a variable (ideally off a regression) based on the image. I previously used ResNet for classification, but my understanding is that the best approach for this would be to design a Sequential NN from scratch, but I don’t know where to start. What would be the best model structure for this that would allow me to load the variable as well as images? Also, would using an ImageFolder work for leading data or is that oriented towards classification?

I’m unsure why a ResNet wouldn’t work as you could still replace the last linear layer (the original classifier) and add a new linear layer returning the regression outputs.

Why would you want to pass the regression target to the model instead of treating it as a target and pass it to the loss function?

ImageFolder is used for classification use cases as it will assign a class label to each folder.

I attempted that originally, but I may not have implemented it properly. Here is what I tried

model.fc = nn.Linear(model.fc.in_features, 1)

But the model was giving me very poor accuracy.

Also, I was a little confused about the model structure, and yes, there is no need to load the variable directly. I asked that because I was writing a custom data loader to normalize the variable, but that is all resolved now.