Mobilebert is so slow and doesn't converge

Has anyone had any issue with mobileBERT? I use mobileBERT and DistillBERT on Hugging face with the same code for text classification. DistillBert quickly converges and gives me a good result! But mobileBERT is so slow and doesn’t converge at all. I couldn’t find any good tip in mobileBERT paper … Is there anything else which I miss?
@ptrblck : Do you have any suggestions for that? Thanks.

I’m not deeply familiar with MobileBERT but by skimming through some blog posts it seems that the MobileBERT model needs a teacher model for the training? Could it be the case that your training script is indeed training two models (a larger teacher and the smaller MobileBERT model)?
I’m also sure, that @stas might be more familiar with these implementations and could help out.


No experience with MobileBert here, but is there any reason why you’re posting this here, instead of or Issues · huggingface/transformers · GitHub?

This is very HF-specific question and you will have a much better turnaround for getting answers by asking these to the right audiences :wink:

When you make a post there, it’d help for you to provide some details for others to know what you’re doing specifically. i.e. how you train, which versions of main components you use, etc. Think what the other person needs to know in order to understand what you’re trying to do.

1 Like

Thanks for your help.