Is there anyway to avoid oom?

wayne1226 · May 14, 2022, 8:34am

I do an element-wise multiplication as:

    a = a.unfold(1, val*2, 1).unfold(2, val*2, 1)
    b=b.unsqueeze(dim=0).unfold(1, val*2, 1).unfold(2, val*2, 1)
    a=((a*(a>c)).sum((3,4)))/((a>c).sum((3,4)))

here, a is a [3,256,256,256,256] matrix, b is [1,256,256,256,256], and I got an OOM when I run this code, I also try the in-place operation like a*=(a>c), although the memory consumption is reduced, I still got an OOM, even the (a>c).sum((3,4) consume a lot of memory. Is there any way to avoid OOM and implement the same functional?

ptrblck · May 15, 2022, 11:21pm

a will already allocate 48GB in float32 and b 16GB. Additionally a>c will create a temp. tensor, which should also consume 16GB, so you would be at 80GB to store these tensors alone.
Since the CUDA context and potentially other tensors would also need some memory, I don’t think you could fit it into a GPU without scaling down the problem.

wayne1226 · May 16, 2022, 1:47am

Thank you for your reply.
What I want to do is similar to a ‘2-d convolution with a different kernel on each element location’, now I apply this by for-loop like this

d = torch.zeros(1,3,256,256).cuda()
for idx in range(val,color.shape[1]-val):
        for idy in range(val,color.shape[2]-val):
            b_block=b[idx-val:idx+val,idy-val:idy+val].clone() 
            a_block = a[...,idx - val:idx + val, idy - val:idy + val].clone()
            c_1=b_block>c
            rendered_img[...,idx-val, idy-val]=(c_1*a_block).sum((1,2))/c_1.sum()

Although this implementation can avoid OOM in this image scale, it is time-consuming and cannot handle images on a larger scale. Can you give me some advice on this? Or I have to trade-off between timing and memory? If so, what if don’t care about the timing?

ptrblck · May 16, 2022, 5:02am

If you don’t care about the timing, then the nested loop should work as it should use the least necessary memory.

wayne1226 · May 16, 2022, 9:01am

I’m trying to add more constraints to scaling down the problem, thanks again for your reply.

wayne1226 · May 27, 2022, 2:03am

I’m now scaling down the problem but the oom still occurs times, can I chunk the tensor ‘a’ and ‘b’ to multi-GPU? I have tried but it seems it doesn’t work.