A convention when training embedding layer


I am training a neural network with embedding layer.

Is there any conventional training strategy for this?

I saw many optional parameter setting (scaling by frequency, max norm, etc.) in Embedding layer and I don’t know when I use this parameter…


have you tried using pretrained embeddings ?

Yes, but I think it is inappropriate and also there is out of vocabulary issue…
I used googlenews, fasttext, bert.

Moreover, I build up my own corpus and pretrained using gensim skipgram model, which also doesn’t work, either… (I also have numerous design decision during this training, e.g. the number of epoch, the size of dimension, min_count…)

I think training embedding layer jointly might be a better choice… Isn’t it?