Dropout eating up lot of memory


(Abhishek Singh) #1

Seems like I am getting some extra memory overheads with dropout. Here is a toy code to illustrate the problem.

import torch.nn as nn
import torch.nn.functional as F
import torch
import gc

from py3nvml.py3nvml import *
nvmlInit()
handle = nvmlDeviceGetHandleByIndex(0)
def print_free_memory(point):
  info = nvmlDeviceGetMemoryInfo(handle)
  print("Used memory: {:10.4f}GB at point {}".format(info.used/(1024**3), point))

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        pass
    def forward(self, x):
        print_free_memory("before dropout")
        output = F.dropout(x, training=True)
        print(output.shape)
        print_free_memory("after dropout")
        return output

model = Test().cuda()

def run():
  device = torch.device('cuda')
  for i in range(1,2):
    x = torch.rand(30, 175, 4096).to(device)
    out = model(x)

run()

For this run, output is:

Used memory:     0.7822GB at point before dropout
torch.Size([30, 175, 4096])
Used memory:     1.2705GB at point after dropout

AFAIK x will occupy (30*175*4096*32) / (8*1024*1024) = 82MB of memory and there is a x.clone() in dropout so in total it should occupy 82*2=164MB. But as we can see the difference here is roughly 490MB. Although the difference is not very high here, in my case where I stack multiple layers with each of them having dropout enabled, makes the model go out of memory.

UPDATE:
If I use inplace=True then there is a slight reduction of used memory after dropout (from 1.2705GB to 1.1885GB) which is exactly equal to the memory occupied by output variable.


Scope and memory consumption of tensors created using self.new_* API
(Nicolò Savioli) #2

I think you storage in global memory within training=True, try without this flag and let see.


(Abhishek Singh) #3

If I do training=False then dropout doesn’t work. This line returns back the original tensor if flag is not enabled.


(Nicolò Savioli) #4

Hey, sorry because you use the in-function version, try that:

import torch.nn as nn
drop = nn.Dropout2d()

then, you use drop as a function like: drop(…)

By the way which CuDNN version did you use?


#5

I’ve tried your code with print(torch.cuda.memory_allocated()) instead of your nvml functions, since I’m not familiar with them.
It seems the code uses approx. 82MB:

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()

    def forward(self, x):
        print(torch.cuda.memory_allocated() / 1024**2)
        output = F.dropout(x, training=True)
        print(torch.cuda.memory_allocated() / 1024**2)
        return output


def run():
  for i in range(1,2):
    x = torch.rand(30, 175, 4096).to(device)
    out = model(x)


device = torch.device('cuda')
model = Test().to(device)

run()
> 165
> 247