Hi, AlbanD
In that post, the concatenation op doesn’t allocate new memory. It maintains a pointer table which points to the shared memory storage.
The simple solution you suggest below won’t work generally (e.g. the required input type is a tensor rather than a list or I want to concatenate two tensors along with different dimensions).
You can already do that by just using the list with your two Tensors
To clarify, let me use a simple example to explain what I want.
Suppose now we concatenate two tensor through below code
t1 = torch.randn(512, 256, 100, 100)
t2 = torch.randn(512, 256, 100, 100)
t = torch.cat(t1, t2, dim=1)
The total memory consuming here will be 512x256x100x100x4 number of float32. Besides, simply list t = [t1, t2] is incorrect.
Is it possible to implement a memory efficient concatenation like
t1 = torch.randn(512, 256, 100, 100)
t2 = torch.randn(512, 256, 100, 100)
t_efficient = torch.cat(t1, t2, dim=1, allocation="shared")
The variable t_efficient just records the memory reference of t1 and t2 rather than allocating new memory, and the total memory consuming should be 512x256x100x100x2.