I’m trying to understand the code here.
Can you help me understand why Embedding needs to use index_select, rather than simple [...]?
I also wonder why it has to have its own backwards method? Can’t autograd figure out the indexing?
I was wondering about index_select in case it would allow me to get a sparse gradient if I used it instead of […] in python. But this doesn’t seem to work?
I think you are saying that autograd is able to make sparse gradients itself, but that the Embedding class added a custom backwards method here because they wanted extra checks?
Is there an indexing function that provides sparse gradients? Or should I implement indexing by means of a sparse matrix multiplication in order to achieve this?