Index_select same storage as the original tensor

Hey guys! What’s up?

I’m trying to implement a certain suggestion in this paper.
They suggest that instead of softmaxing over all the vocabulary size at the last step of our network, you should softmax over a chosen subset of the vocabulary (different subset at each batch).
So I thought of using index_select on each batch, and when I will need to update this tensor, it will update the original tensor as well.
But this is not possible because (as written in the index_select documentation):

The returned Tensor does not use the same storage as the original Tensor

Was wondering if there’s another option so I will be able to do that, either by using index select but with the new tensor pointing to the storage of the original tensor, or any other option.

Thanks, Matan.

Hello

from your description, you might be able to use the Tensor’s index_add_ or index_copy_ methods.

Best regards

Thomas

Hey man, thanks for you answer.
To be sure I understand your comment, you’re suggesting to create a new variable (as I explained in the first message), then backprop, and copy the state of the new variable to the corresponding rows in the original variable.
Sounds as this should work, as the original variable is a leaf in the graph.
But I think it’s a bit computationally wasteful.

Are you familiar with another way that the new variable will be a view to the original variable?

Thanks.

Hello,

no, I’m suggesting that you do things as you would and when it comes to updating the tensor, you use one of the two on somevariable.data.

So the most typical case of updating a variable is in the optimizers, there you can see that update modifies the .data of the Variables (well Parameters, but that’s the same for this here)

Best regards

Thomas

I have a similar problem. I tried index_add_ on original.data and the result of indexing is still silently copying. Is there any another way to preserve the same storage?

This can’t be generally done. Tensors support strided view, but generally speaking indexed results can be anywhere in the tensor, so it unfortunately has to be copy. (if each position is associated with an index, it will be more expensive). So select + copy is the correct way here.