I see most of the networks using an embedding with size 256 512 and 1024 while dealing with a huge vocabulary. I am deciding my size depends on my experiments over the network, observing losses and comparing Bleu scores. So is it an experiment for everyone when deciding the size or is there a specific formula for this?
I don’t think there is any specific formula for fixing on embedding size. Like you said, it could be decided based on experimentations. Incase of images, the embedding size can vary based on the input image size, architecture being used.
i see, I use 4096 sized one dimensional-vector as image input on the right side of the network and left side i am generating words. Its a basic merge-image captioning model. But not sure how to understand which embedding size is better. I use 256 as i saw similar models with same input image size but trying to understand why
Okay. Maybe you can fix embedding size in terms of the underlying data/feature complexity. Tuning based on that might help.
I will try, so is it possible for me to understand which embedding size is better from training, valid loss relation or BLEU scores?
Yes, based on loss/accuracy.