Torch.max(tensor1, tensor2) memory problem

Hi,

For example, if I have bunch of feature maps, let’s say we have 100 feature map in the shape [128, 256, 256].
then I want to get the pixel-wise max feature map of this 100 feature.
the easiest way is to use torch.cat() to concatenate the total 100 feature map then use torch.max() to get the result. But my memory is not big enough to store the tensor from concatenation.
So, I think maybe I can use for-loop to get the max tensor. code is like this:

for i in range(100):
    if i == 0:
        out = feature[i]
    else:
        out = torch.max(out, feature[i])

but I find that memory is still growing. I believe it is because this torch.max(out, feature[i]) also need to create a continuous space to hold the tensor. But is there a way to free this memory after the calculation? Because ideally we don’t need this memory to do auto-grad, we only need to remember each pixels comes from then we can get gradient.

I assume feature is a list holding all tensors?
If that’s the case, you would have enough memory to at least store the data once.
Could you try to create a feature tensor in the first place instead of a list?
If you are working close to the memory limit, the memory peaks might yield an OOM error.

That being said, if you don’t need gradients for the calculation, you could wrap the code in a with torch.no_grad() block.

Thanks for replying.

I find that my statement is misleading. Sorry about that, let me make the problem more clear.

Suppose I have 300 vector store in a tensor called feature, which is in the shape [300, 1024]. And we also have an index tensor, which is in [300] size and indicates which vector should send into max() operation to get maximum tensor. so, index is something like [1,0,1,0,0,1,…,0].

The easiest way to do this is just one line code:
out = torch.max(feature[torch.where(index == 1)], 0)
But, we have a limit memory, and I guess this part feature[torch.where(index == 1)] will allocate a new memory, which is not acceptable.

So, I think there is another way: first, allocate space in memory for storage maximum result. Then, use a for-loop to compare each feature with result one-by-one.

out = torch.zeros([1024])
for i in range(300):
    if index[i] == 1:
        out = torch.max(out, feature[i])

So, ideally, we only need to allocate one [1024] size memory for out tensor, and this is acceptable by our gpu. (Both index and feature are tensors here, but we don’t want to create a big intermediate variable so we use for-loop)

However, I find that during the for-loop, each iteration, memory is still growing. I don’t know why, and how to avoid it.

We need out tensor contains gradient for back-propagation. So I guess this is the result pytorch doesn’t release the memory after torch.max(out, feature[i]). But, ideally, we just need one same size tensor with out to remember where the max value come from, then we can do backpropagation. Is there a way to do that?