Memory usage suddenly increase with specific input shape on torch.nn.Conv2d

Summary : I found memory cached or usage increase significantly on Conv2d for specific input shape,
like from torch.randn(14, 512, 2, 64) to torch.randn(15, 512, 2, 64), the memory could suddenly increase 500~1000MB, while I change 64 to 128 (width dimension), the memory usage come back to normal.

layer = torch.nn.Conv2d(512,
                     512,
                     kernel_size=(3, 3),
                     stride=(2, 1),
                     padding= (1, 1),
                     bias=False).to('cuda:0').eval()

torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_max_memory_cached()
torch.cuda.empty_cache()

with torch.no_grad():
    inputs = torch.randn(15, 512, 2, 64).to('cuda:0')
    out = layer(inputs)
    
print('Max gpu memory allocated : {:.2f}'.format(torch.cuda.max_memory_allocated() / 1024 / 1024))
print('Max gpu memory cached : {:.2f}'.format(torch.cuda.max_memory_reserved() / 1024 / 1024))

## return
## Max gpu memory allocated : 1885.24
## Max gpu memory cached : 2062.00

The weird thing is that, while I replace width dimension from 64 to 128 (total parameters increase), or batch size from 15 to 14, the memory usage drop significantly.

torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_max_memory_cached()
torch.cuda.empty_cache()

with torch.no_grad():
    inputs = torch.randn(15, 512, 2, 128).to('cuda:0')
    out = layer(inputs)
    
print('Max gpu memory allocated : {:.2f}'.format(torch.cuda.max_memory_allocated() / 1024 / 1024))
print('Max gpu memory cached : {:.2f}'.format(torch.cuda.max_memory_reserved() / 1024 / 1024))

## return
## Max gpu memory allocated : 680.68
## Max gpu memory cached : 846.00

Is there anyone know the reason ? Thanks in advance.

Version :
torch 1.6.0
torchvision 0.7.0

Hi,

cudnn has many algorithms it can choose from to perform convolution. This choice depends on the input size and so a small change in input size might trigger a different algorithm to be used and thus different memory behavior.
You can try setting torch.backends.cudnn.benchmark=True for cudnn to try and choose the best algorithm (for speed, not memory).

1 Like

I see, thanks for the reply, one more question is that any suggestion if I wanna do the memory management? Since the input size might vary from each inference in my case, I am suffering from memory explosion from time to time.

There is no specific tool for this no Iā€™m afraid.
You can also try torch.backends.cudnn.deterministic=True to force special deterministic algorithms that will reduce the available algos and will prevent these switches.

Thanks!
Btw, How about CPU inference ? Does memory usage from RAM also face same issue? And do we have API to monitor memory usage on CPU inference cases.

On CPU we use our own algo by default. There is a single one so all the usage should be smooth :slight_smile:
If you use mkldnn or other cpu backend explicitly though you might see similar behavior.

1 Like