Memory usage suddenly increase with specific input shape on torch.nn.Conv2d

jeff52415 · October 17, 2020, 3:46am

Summary : I found memory cached or usage increase significantly on Conv2d for specific input shape,
like from torch.randn(14, 512, 2, 64) to torch.randn(15, 512, 2, 64), the memory could suddenly increase 500~1000MB, while I change 64 to 128 (width dimension), the memory usage come back to normal.

layer = torch.nn.Conv2d(512,
                     512,
                     kernel_size=(3, 3),
                     stride=(2, 1),
                     padding= (1, 1),
                     bias=False).to('cuda:0').eval()

torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_max_memory_cached()
torch.cuda.empty_cache()

with torch.no_grad():
    inputs = torch.randn(15, 512, 2, 64).to('cuda:0')
    out = layer(inputs)
    
print('Max gpu memory allocated : {:.2f}'.format(torch.cuda.max_memory_allocated() / 1024 / 1024))
print('Max gpu memory cached : {:.2f}'.format(torch.cuda.max_memory_reserved() / 1024 / 1024))

## return
## Max gpu memory allocated : 1885.24
## Max gpu memory cached : 2062.00

The weird thing is that, while I replace width dimension from 64 to 128 (total parameters increase), or batch size from 15 to 14, the memory usage drop significantly.

torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_max_memory_cached()
torch.cuda.empty_cache()

with torch.no_grad():
    inputs = torch.randn(15, 512, 2, 128).to('cuda:0')
    out = layer(inputs)
    
print('Max gpu memory allocated : {:.2f}'.format(torch.cuda.max_memory_allocated() / 1024 / 1024))
print('Max gpu memory cached : {:.2f}'.format(torch.cuda.max_memory_reserved() / 1024 / 1024))

## return
## Max gpu memory allocated : 680.68
## Max gpu memory cached : 846.00

Is there anyone know the reason ? Thanks in advance.

Version :
torch 1.6.0
torchvision 0.7.0

albanD · October 17, 2020, 2:27pm

Hi,

cudnn has many algorithms it can choose from to perform convolution. This choice depends on the input size and so a small change in input size might trigger a different algorithm to be used and thus different memory behavior.
You can try setting torch.backends.cudnn.benchmark=True for cudnn to try and choose the best algorithm (for speed, not memory).

jeff52415 · October 17, 2020, 2:39pm

I see, thanks for the reply, one more question is that any suggestion if I wanna do the memory management? Since the input size might vary from each inference in my case, I am suffering from memory explosion from time to time.

albanD · October 17, 2020, 2:43pm

There is no specific tool for this no I’m afraid.
You can also try torch.backends.cudnn.deterministic=True to force special deterministic algorithms that will reduce the available algos and will prevent these switches.

jeff52415 · October 17, 2020, 2:49pm

Thanks!
Btw, How about CPU inference ? Does memory usage from RAM also face same issue? And do we have API to monitor memory usage on CPU inference cases.

albanD · October 17, 2020, 3:00pm

On CPU we use our own algo by default. There is a single one so all the usage should be smooth
If you use mkldnn or other cpu backend explicitly though you might see similar behavior.