Embedding layer size meaning

cevvalu · May 26, 2020, 10:43am

I see most of the networks using an embedding with size 256 512 and 1024 while dealing with a huge vocabulary. I am deciding my size depends on my experiments over the network, observing losses and comparing Bleu scores. So is it an experiment for everyone when deciding the size or is there a specific formula for this?

mailcorahul · May 26, 2020, 11:10am

I don’t think there is any specific formula for fixing on embedding size. Like you said, it could be decided based on experimentations. Incase of images, the embedding size can vary based on the input image size, architecture being used.

cevvalu · May 26, 2020, 11:19am

i see, I use 4096 sized one dimensional-vector as image input on the right side of the network and left side i am generating words. Its a basic merge-image captioning model. But not sure how to understand which embedding size is better. I use 256 as i saw similar models with same input image size but trying to understand why

mailcorahul · May 26, 2020, 11:39am

Okay. Maybe you can fix embedding size in terms of the underlying data/feature complexity. Tuning based on that might help.

cevvalu · May 26, 2020, 11:50am

I will try, so is it possible for me to understand which embedding size is better from training, valid loss relation or BLEU scores?

mailcorahul · May 26, 2020, 11:54am

Yes, based on loss/accuracy.