Index_select same storage as the original tensor

mataney · June 27, 2017, 1:28pm

Hey guys! What’s up?

I’m trying to implement a certain suggestion in this paper.
They suggest that instead of softmaxing over all the vocabulary size at the last step of our network, you should softmax over a chosen subset of the vocabulary (different subset at each batch).
So I thought of using index_select on each batch, and when I will need to update this tensor, it will update the original tensor as well.
But this is not possible because (as written in the index_select documentation):

The returned Tensor does not use the same storage as the original Tensor

Was wondering if there’s another option so I will be able to do that, either by using index select but with the new tensor pointing to the storage of the original tensor, or any other option.

Thanks, Matan.

tom · June 27, 2017, 6:58pm

Hello

from your description, you might be able to use the Tensor’s index_add_ or index_copy_ methods.

Best regards

Thomas

mataney · June 28, 2017, 9:27am

Hey man, thanks for you answer.
To be sure I understand your comment, you’re suggesting to create a new variable (as I explained in the first message), then backprop, and copy the state of the new variable to the corresponding rows in the original variable.
Sounds as this should work, as the original variable is a leaf in the graph.
But I think it’s a bit computationally wasteful.

Are you familiar with another way that the new variable will be a view to the original variable?

Thanks.

tom · June 28, 2017, 6:54pm

Hello,

no, I’m suggesting that you do things as you would and when it comes to updating the tensor, you use one of the two on somevariable.data.

So the most typical case of updating a variable is in the optimizers, there you can see that update modifies the .data of the Variables (well Parameters, but that’s the same for this here)

github.com

pytorch/pytorch/blob/master/torch/optim/sgd.py#L98


            param_state = self.state[p]
            if 'momentum_buffer' not in param_state:
                buf = param_state['momentum_buffer'] = torch.zeros_like(p.data)
                buf.mul_(momentum).add_(d_p)
            else:
                buf = param_state['momentum_buffer']
                buf.mul_(momentum).add_(1 - dampening, d_p)
            if nesterov:
                d_p = d_p.add(momentum, buf)
            else:
                d_p = buf


        p.data.add_(-group['lr'], d_p)


return loss

Best regards

Thomas

chetter · March 19, 2018, 6:20pm

I have a similar problem. I tried index_add_ on original.data and the result of indexing is still silently copying. Is there any another way to preserve the same storage?

SimonW · March 19, 2018, 7:09pm

This can’t be generally done. Tensors support strided view, but generally speaking indexed results can be anywhere in the tensor, so it unfortunately has to be copy. (if each position is associated with an index, it will be more expensive). So select + copy is the correct way here.