Is it possible to use pre-trained image multi-label classification model

I knew that nowadays, people usually use a pre-trained model to fine-tune and train their own models. There are few famous models that have been trained with the ImageNet dataset is Resnet, Inception, or VGG.

You may all know that the ImageNet dataset comes along with a Multi-class classification problem so that we usually use softmax with CrossEntropy to train the model. I usually use pre-trained Resnet50 or VGG19 in TorchVision models

I just wonder if we have a large dataset of Multi-Label instead (suppose 10M images) of Multi-class classification with more than 10k labels (10x larger than Imagenet labels) so that I can build a CNN model using sigmoid with BinaryCrossEntroy to train our model with that dataset.

Is that a good idea if we can use that new checkpoint model for transfer learning purposes? If yes, when we should use that scenario when we shouldn’t? or Why not?

Using a pretrained model shouldn’t hurt your training, as you could see it as a “good initialization” of the parameters in comparison to a random initialization.

From my experience, the model loss might blow up and you would need to retrain the model, but this shouldn’t be worse than training from scratch.

An advantage and also a shortcoming from using a pretrained model is the fixed architecture.
While it should be no problem to change e.g. the final classifier, I’m not sure which approach would work best e.g. for transforming a 2D model into a 3D one.