`scale_grad_by_freq` in Embeding layer

According to the doc, if scale_grad_by_freq set to true, then grad will be scaled according to the freq of the words in dictionary.

But where does the Embedding layer find out the freq of each word? I mean I didn’t see any param to tell this freq info to the Embedding layer other than this scale_grad_by_freq param.

It’s scaled by the frequency of the words in the mini-batch (not the dictionary). We should fix the docs.

2 Likes

@colesbury, thanks so much for this clarification :slight_smile:

BTW, there’s a sparse param in Embedding, no doc about it yet

Yeah sparse is still a WIP and may change

documentation for scale_grad_by_freq has now been fixed in master.