Which pre-trained model to be used for Binary Image document classifier?

Aarya · August 31, 2022, 9:41am

I want to train a model ( Binary classifier) which should classify the image documents into respective classes like ( child-optic / not-child-optic ).

Dataset : Image documents (child optic data)

Which pre-trained model can be used in this case ? and where can i find it ?

islomjon · August 31, 2022, 9:49am

Hello Aarya, can you elaborate more about your dataset?

Maybe my answer could be redundant, but so far, you can use pretrained model with transfer learning, for example ResNet. An example you can find here: Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 1.12.1+cu102 documentation

Aarya · August 31, 2022, 11:52am

Thank you for your response @islomjon
Dataset : Binary classes:
KO_false - 1800 images
KO_true - 1600 images

Should the dataset be balanced, so should be fine ?

Aarya · August 31, 2022, 11:59am

@islomjon Which one do you think will best suit my situation with the dataset ?

These two major transfer learning scenarios look as follows:

Finetuning the convnet: Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
ConvNet as fixed feature extractor: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.

srishti-git1110 · August 31, 2022, 2:44pm

@Aarya Both approaches are fine to use, but in my experience (in both vision and NLP domains) finetuning the whole network (Scenario 1) outperforms the other way where only the top layer is fine-tuned. However, more resources in terms of time and cost are required in the former.
Reason is kind of obvious - a larger no. of weights get fine-tuned according to the dataset at hand in the former.

Some pointers -
Generally, 2nd scenario works well only when the dataset at hand comes from atleast a similar domain as the dataset that was used to pre-train the network.
Remember to work with a decreased learning rate here.

In case, domains are entirely different and you need to do what is called “domain adaptation” according to your dataset, 1st scenario should give better results.

Also note the fact that even for different domains, the layers close to the input layer have been shown to learn nearly the same set of (low-level) features and this knowledge is hence transferable across domains. So, you can maybe put it like – The no.of layers to keep frozen is inversely related to the similarity between the dataset used to pre-train & the dataset at hand.

Additionally, other factors like the size of dataset might also play a role in which scenario to choose.

Hope this helps.