How to do in-place indexing?

Hi, I wonder if there is any method to do in-place indexing to “crop” the tensor without extra memory cost.

For example, I have a tensor x = torch.rand(2,3,4, device=“cuda”), when we index x = x[:,:,0::2], in my opinion, we only return a view of the original data, and the memory cost is still O(2x3x4).

tensor.resize_() seems to be an in-place method, but it is not an indexing operation.

Could you help me? Thank you.

If you can discuss the situation, when x requires grad and not, it would be better.

if you don’t want to hold onto the source tensor, but need its values, you obviously have to make a copy, e.g. x2=x[…,::2].clone(). at this point, you’d rely on the garbage collector to release x (doing “del x” may help).

as for gradients… I think x slicing wont keep a reference to original storage (in autograd code), so it should be the same. In other words, gradients of indexing/cloning don’t depend on data values, it should be enough to know shapes/indexes to do backprop.

Thank you. tensor.clone() can allocate new memory space with new data_ptr, then the old data can be retrieved by garbage collector.

But I wonder if we have more lightweight method — could we do the in-place indexing without (or with less) memory cost? In fact, I think tensor.resize_() is a good way, but it not supports indexing. Also, .clone() may cost some time to clone the tensor.

Thank you for your reply.

I don’t understand, with x[…,::2] you already have a non-compact view, you can have either that or make a compact data copy (clone() or contiguous())
If you’re asking about “dual” objects like a masked array, there is no bult-in support for that. And these are not compact too.
If you want to manipulate metadata with resize_ or set_, I’m not sure if you can achieve anything beyond what view() does.

Hi Alex, so many thanks for your patience.

The question is, I do want to make a compact data copy as this will incur memory cost. I only want to release the data region where I do want to use.

I make a possible solution. If I only want x[…, ::2] data, I can assign it to:

x[…, :2] = x[…, ::2]

Then, I may use the resize_ method:

x.resize_(2,3,2)

With this, there is no extra memory cost to maintain a part of the tensor.

Could you think this way is good, or other better solution?

Thank you.

well, that sounds like a crazy micro-optimization, but if you manage to correctly self-copy elements without an external buffer (that’s only possible with some overlap patterns, I believe; and copy routine implementation has to support overlaps) - sure, you can do this.
If you want gradients, I think you’ll at least need a new metadata object, that is instead of resize_ do something like x[:12].view(2,3,2) (that is in case of 1d compaction, if you assign to x[…,:2], the storage is still gapped like xx00xx00).

Thank you. In fact I just operate on a tensor without grad record.

But I find that resize_ can only crop the consecutive memory region … In practice, I do some operations in a spatial-temporal tensor (B x C x T x H x W) in video models, and I want to do in-place indexing in T dimension. However, T dimension is not consecutive in memory thus resize_ produces wrong results.

Now I just use .clone(), although it costs more memory.

well, let’s see…

you want to compact a dimension in the middle, I’ll simplify this to dims representing (B*C,T,H*W) and sizes 2,4,3

x = torch.zeros(2,4,3)
x[:,::2].copy_(torch.arange(12).view(2,2,3)) #values we want to extract
y = x.view(2*4,3)[:4] #compact and truncated area to copy into
y.copy_(x[:,::2].view_as(y))
y = y.view(2,2,3) #unflatten head dimensions

this works on cpu with sequential copying, I have a suspiction that this may break on GPU (maybe for some shapes) if thread block execution gets reordered by the scheduler - really not sure about this.

1 Like

Well, thank you so much, Alex. The nice indexing trick.

I think after this the desired values are copied to the consecutive memory space. Then I can use resize_ method to crop the tensor.

A really great job. I have already known how to do.

Thank you!

However I find that resize_() cannot save memory … It shares the same memory space with the original tensor.

Maybe the only way is to use clone(), then rely on garbage collector to release original tensor.

Ah, right, you also need OS-level truncation (resize_ is not needed in my snippet). Check if this will work for you:

x=torch.zeros(1<<28)
y=x[:1000]
del x
y.storage().resize_(1000)