CPU RAM is increasing while Profiling

Hello All

I have been trying to profile various two-layer CNN and I find my RAM increasing linearly. I have attached my profiling code snippet. Is there something which I can do to reduce RAM access?

for i in range(len(twopair)):
    model = nn.Sequential(nn.Conv2d(input_channels, twopair[i][0], kernel, stride, padding, bias=False),
    nn.Conv2d(twopair[i][0], twopair[i][1], kernel, stride, padding, bias=False))
    time = []
    for j in range(ExpEpoch):
        x = torch.randn([batch_size, input_channels, input_dim, input_dim])
        with torch.autograd.profiler.profile() as prof:
            y = model(x)
        time.append(prof.self_cpu_time_total)

Thanks in advance for any replies!

I’m seeing an increased memory usage as well, which will be freed eventually.
You might try to lower the threshold for the garbage collection using gc.set_threshold.

Anyway, I would assume it should automatically kick in once you run out of memory.

2 Likes

Unfortunately, GC didn’t kick in automatically and the process got killed due to out of memory.

1 Like

That might be another issue.
Could you post the undefined variables so that we could try to reproduce it?

I just used some dummy setups and saw an increase in memory usage, which was then cleared after a few iterations.

1 Like

Sure. I have been trying to profile for a large number of filters of Convolution on CPUs and found that process gets killed on few Machines with 16, 32 G of RAM.

ExpEpoch = 30

input_dim = 224
batch_size = 1
kernel = 3
stride = 1
padding = 1
OutputLayers = [3, 6, 12, 16, 18, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 384, 432, 480, 512, 576, 640, 672, 720, 768, 896, 960]
twopair = []
input_channels = 3
for i in OutputLayers:
    for j in OutputLayers:
        twopair.append([i,j])

name =  "backupconv_" + platform.node() + "two.csv"
bfile = open(name,'a')
layer = 2

for i in range(len(twopair)):
    model = nn.Sequential(nn.Conv2d(input_channels, twopair[i][0], kernel, stride, padding, bias=False),
    nn.Conv2d(twopair[i][0], twopair[i][1], kernel, stride, padding, bias=False))
    time = []
    for j in range(ExpEpoch):
        x = torch.randn([batch_size, input_channels, input_dim, input_dim])
        with torch.autograd.profiler.profile() as prof:
            y = model(x)
        time.append(prof.self_cpu_time_total)
    mean_time = statistics.mean(time)
    vari_time = statistics.stdev(time)
    point = [layer, input_dim, kernel, stride, padding, input_channels, twopair[i][0], twopair[i][1],  mean_time, vari_time]
    for itr in point:
        writestring = writestring + str(itr) + ','
    writestring += '\n'
    bfile.write(writestring)

Were you able to resolve the memory error? I am facing the similar problem, the process gets killed when its out of memory. Is there a solution to resolve this issue?

Thanks.

Can the out of memory (or the process gets killed) get resolved with torch.autograd.profiler.emit_nvtx():?

The original issue was reporting an increase in CPU RAM. Are you seeing the same or is your GPU running out of memory?
In the latter case, could you install the latest nightly binary (or build from source), as this fix might be related to an increase in GPU memory usage.