Hi, I wonder if there is any method to do in-place indexing to “crop” the tensor without extra memory cost.
For example, I have a tensor x = torch.rand(2,3,4, device=“cuda”), when we index x = x[:,:,0::2], in my opinion, we only return a view of the original data, and the memory cost is still O(2x3x4).
tensor.resize_() seems to be an in-place method, but it is not an indexing operation.
Could you help me? Thank you.
If you can discuss the situation, when x requires grad and not, it would be better.
if you don’t want to hold onto the source tensor, but need its values, you obviously have to make a copy, e.g. x2=x[…,::2].clone(). at this point, you’d rely on the garbage collector to release x (doing “del x” may help).
as for gradients… I think x slicing wont keep a reference to original storage (in autograd code), so it should be the same. In other words, gradients of indexing/cloning don’t depend on data values, it should be enough to know shapes/indexes to do backprop.
Thank you. tensor.clone() can allocate new memory space with new data_ptr, then the old data can be retrieved by garbage collector.
But I wonder if we have more lightweight method — could we do the in-place indexing without (or with less) memory cost? In fact, I think tensor.resize_() is a good way, but it not supports indexing. Also, .clone() may cost some time to clone the tensor.
I don’t understand, with x[…,::2] you already have a non-compact view, you can have either that or make a compact data copy (clone() or contiguous())
If you’re asking about “dual” objects like a masked array, there is no bult-in support for that. And these are not compact too.
If you want to manipulate metadata with resize_ or set_, I’m not sure if you can achieve anything beyond what view() does.
well, that sounds like a crazy micro-optimization, but if you manage to correctly self-copy elements without an external buffer (that’s only possible with some overlap patterns, I believe; and copy routine implementation has to support overlaps) - sure, you can do this.
If you want gradients, I think you’ll at least need a new metadata object, that is instead of resize_ do something like x[:12].view(2,3,2) (that is in case of 1d compaction, if you assign to x[…,:2], the storage is still gapped like xx00xx00).
Thank you. In fact I just operate on a tensor without grad record.
But I find that resize_ can only crop the consecutive memory region … In practice, I do some operations in a spatial-temporal tensor (B x C x T x H x W) in video models, and I want to do in-place indexing in T dimension. However, T dimension is not consecutive in memory thus resize_ produces wrong results.
Now I just use .clone(), although it costs more memory.
you want to compact a dimension in the middle, I’ll simplify this to dims representing (B*C,T,H*W) and sizes 2,4,3
x = torch.zeros(2,4,3)
x[:,::2].copy_(torch.arange(12).view(2,2,3)) #values we want to extract
y = x.view(2*4,3)[:4] #compact and truncated area to copy into
y.copy_(x[:,::2].view_as(y))
y = y.view(2,2,3) #unflatten head dimensions
this works on cpu with sequential copying, I have a suspiction that this may break on GPU (maybe for some shapes) if thread block execution gets reordered by the scheduler - really not sure about this.