I’m trying to implement a certain suggestion in this paper.
They suggest that instead of softmaxing over all the vocabulary size at the last step of our network, you should softmax over a chosen subset of the vocabulary (different subset at each batch).
So I thought of using index_select on each batch, and when I will need to update this tensor, it will update the original tensor as well.
But this is not possible because (as written in the index_select documentation):
The returned Tensor does not use the same storage as the original Tensor
Was wondering if there’s another option so I will be able to do that, either by using index select but with the new tensor pointing to the storage of the original tensor, or any other option.
Hey man, thanks for you answer.
To be sure I understand your comment, you’re suggesting to create a new variable (as I explained in the first message), then backprop, and copy the state of the new variable to the corresponding rows in the original variable.
Sounds as this should work, as the original variable is a leaf in the graph.
But I think it’s a bit computationally wasteful.
Are you familiar with another way that the new variable will be a view to the original variable?
This can’t be generally done. Tensors support strided view, but generally speaking indexed results can be anywhere in the tensor, so it unfortunately has to be copy. (if each position is associated with an index, it will be more expensive). So select + copy is the correct way here.