I have a scenario in which I have a 5128 dimensional parameter.
I also have a [16, 1] dimensional variable.
What I want to do?
Use the elements of the 16 x 1 dimensional variable as index to get the elements of the 5128 dimensional tensor.
In Python I would have done it like: [parameter[i] for i in variable], but here I want to tune the parameters, so I need to do it in a way gradients flow.
You can do that and it will work wereas the dimensional variable is fixed. If you expect to make a learnable vector which chooses parameters it will not work.
Aditionally, consider that those elements which aren’t “chosen” will does not backpropagate anything. You can always go for other ways like attention rather than this.