CPU RAM is increasing while Profiling

surya00060 · February 11, 2020, 8:41am

Hello All

I have been trying to profile various two-layer CNN and I find my RAM increasing linearly. I have attached my profiling code snippet. Is there something which I can do to reduce RAM access?

for i in range(len(twopair)):
    model = nn.Sequential(nn.Conv2d(input_channels, twopair[i][0], kernel, stride, padding, bias=False),
    nn.Conv2d(twopair[i][0], twopair[i][1], kernel, stride, padding, bias=False))
    time = []
    for j in range(ExpEpoch):
        x = torch.randn([batch_size, input_channels, input_dim, input_dim])
        with torch.autograd.profiler.profile() as prof:
            y = model(x)
        time.append(prof.self_cpu_time_total)

Thanks in advance for any replies!

ptrblck · February 12, 2020, 7:28am

I’m seeing an increased memory usage as well, which will be freed eventually.
You might try to lower the threshold for the garbage collection using gc.set_threshold.

Anyway, I would assume it should automatically kick in once you run out of memory.

surya00060 · February 12, 2020, 10:33am

Unfortunately, GC didn’t kick in automatically and the process got killed due to out of memory.

ptrblck · February 12, 2020, 4:28pm

That might be another issue.
Could you post the undefined variables so that we could try to reproduce it?

I just used some dummy setups and saw an increase in memory usage, which was then cleared after a few iterations.

surya00060 · February 13, 2020, 10:23am

Sure. I have been trying to profile for a large number of filters of Convolution on CPUs and found that process gets killed on few Machines with 16, 32 G of RAM.

ExpEpoch = 30

input_dim = 224
batch_size = 1
kernel = 3
stride = 1
padding = 1
OutputLayers = [3, 6, 12, 16, 18, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 384, 432, 480, 512, 576, 640, 672, 720, 768, 896, 960]
twopair = []
input_channels = 3
for i in OutputLayers:
    for j in OutputLayers:
        twopair.append([i,j])

name =  "backupconv_" + platform.node() + "two.csv"
bfile = open(name,'a')
layer = 2

for i in range(len(twopair)):
    model = nn.Sequential(nn.Conv2d(input_channels, twopair[i][0], kernel, stride, padding, bias=False),
    nn.Conv2d(twopair[i][0], twopair[i][1], kernel, stride, padding, bias=False))
    time = []
    for j in range(ExpEpoch):
        x = torch.randn([batch_size, input_channels, input_dim, input_dim])
        with torch.autograd.profiler.profile() as prof:
            y = model(x)
        time.append(prof.self_cpu_time_total)
    mean_time = statistics.mean(time)
    vari_time = statistics.stdev(time)
    point = [layer, input_dim, kernel, stride, padding, input_channels, twopair[i][0], twopair[i][1],  mean_time, vari_time]
    for itr in point:
        writestring = writestring + str(itr) + ','
    writestring += '\n'
    bfile.write(writestring)

Ashima_Garg · June 18, 2020, 3:19pm

Were you able to resolve the memory error? I am facing the similar problem, the process gets killed when its out of memory. Is there a solution to resolve this issue?

Thanks.

Ashima_Garg · June 18, 2020, 11:00pm

Can the out of memory (or the process gets killed) get resolved with torch.autograd.profiler.emit_nvtx():?

ptrblck · June 25, 2020, 5:19pm

The original issue was reporting an increase in CPU RAM. Are you seeing the same or is your GPU running out of memory?
In the latter case, could you install the latest nightly binary (or build from source), as this fix might be related to an increase in GPU memory usage.