Dynamic heads in a model - Idea. Intuition and questions

I am experimenting with EfficientNet for a regression task(3 dependent variables). The train dataset has images and some metadata that I would like to use , however, the test set will only have image. So if I were to have a model which will take image and metadata as inputs and since test cases will not have metadata , i will have to have a model(s) to predict metadata for the given image and then use it in my final model to predict targets. Zooming into metadata, things get bit more interesting. Lets say we have meta data A,B,C. So we are looking at 3 models which will have image as the constant input and varying length of metadata..eg, to predict A, image is sufficient, to predict B , image and A is required..sort of heirarchical. so metadata length of each sub-models are different.. and different loss functions, activations are different as some meta is regression,categorical, one-hot etc. etc..

So its all about the classifier head of the efficientnet.

Of course, I can create 4 different models (1 for final regression and 3 for sub-models for meta).

However, I am experimenting with a single model with dynamically changing classifier heads.I am using LazyLinear(that solves the input length issue) so that input length doesn’t have to be specified.

So I have a json structure with the required details of models (loss fn,final activation etc..) and dataset is prepared accordinglyand returns image,metadata(length depends of the model) and targets..

The whole objective is to have a single model with dynamically changing heads and infusing dynamically changing metadata of different lengths..sort of declarative and just in time model..(may be like a human brain…)

My questions are :slight_smile:

  1. Is this approach practical ?
  2. what happens when i change the classification head(only metadata) in eval mode ? To ensure, everything as per the training after all batches of an epoch I save the ‘state’ and in eval mode after changing the classifier head(my intuition is when its changed, it will get initialized with random weights) with ‘eval’ metadata , I reload previously saved ‘state’ and do eval predictions(due to the lack of my knowledge to access weights and bias of the classifier head, which is a dynamically created nn.Sequential) . Is this ok ? is there a better way to do this ?

hope i was clear in translating my task..expert comments much appreciated, else I adopt the conventional way of multi-modal approach and continue with life. Thank you in advance.

This was not a very clear explanation of what you are doing or want to do.

Questions:

  • you have a dataset consisting of images and metadata but are saying that you only have a test dataset consisting of images?
  • It seems you are wanting to use this metadata (which is of variable length) to merge with your image classifier’s predictions ?
  • You are experimenting with a model that uses dynamically changing classifier heads? I’m confused here as I thought you were doing regression per “EfficientNet for a regression task(3 dependent variables)”
    • Are you trying to say that you switch between three different classifier heads that you have trained separately in order to solve this problem?
  • You go about describing the requirements to predicting A and B, but never talk about C.

Answering your questions:

  1. “Is this approach practical” depends entirely on its use case. If you plan on putting this into some production scenario that is heavily constrained on time or compute then possibly not. If you’re doing it on your own laptop for fun, then sure, the world is your oyster.
  2. Not entirely sure I understand what you’re asking here; however, I can communicate what eval mode means. Eval is responsible for turning off some layers that behave differently during training (think dropout, you don’t want dropout to occur in your actual trained model, it should only happen during the process of training as a regularization method). https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch

Suggestions:

  • I have no concept of what you’re trying to do, but by the questions you’re asking I imagine you’re doing way too much. I think this problem can probably be solved with a much simpler architecture that you imagine. If you can give us a netron view of your model and a look at the training loop or some kind of workflow that describes how/why you’re having to train several different model heads based on the different types of data then we can give you more help.

I really can’t offer any help beyond this, still didn’t really fully understand what you’re asking.

In the future, I recommend writing down your problem beforehand and being a bit more concise with your issues. Happy to help, I just don’t know how to do so if I don’t know what’s going on.