Mobilebert is so slow and doesn't converge

Fahimeh · October 8, 2021, 2:04am

Has anyone had any issue with mobileBERT? I use mobileBERT and DistillBERT on Hugging face with the same code for text classification. DistillBert quickly converges and gives me a good result! But mobileBERT is so slow and doesn’t converge at all. I couldn’t find any good tip in mobileBERT paper … Is there anything else which I miss?
@ptrblck : Do you have any suggestions for that? Thanks.

ptrblck · October 8, 2021, 5:09am

I’m not deeply familiar with MobileBERT but by skimming through some blog posts it seems that the MobileBERT model needs a teacher model for the training? Could it be the case that your training script is indeed training two models (a larger teacher and the smaller MobileBERT model)?
I’m also sure, that @stas might be more familiar with these implementations and could help out.

stas · October 8, 2021, 5:44am

No experience with MobileBert here, but is there any reason why you’re posting this here, instead of https://discuss.huggingface.co or Issues · huggingface/transformers · GitHub?

This is very HF-specific question and you will have a much better turnaround for getting answers by asking these to the right audiences

When you make a post there, it’d help for you to provide some details for others to know what you’re doing specifically. i.e. how you train, which versions of main components you use, etc. Think what the other person needs to know in order to understand what you’re trying to do.

Fahimeh · October 13, 2021, 11:39pm

Thanks for your help.