Cannot measure CPU usage on MNIST example

I am trying to measure CPU usage on MNIST example on PyTorch 0.4.0 by following commands, but it failed.
How to avoid this issue?

$ python -m torch.utils.bottleneck main.py --no-cuda

===
Traceback (most recent call last):
File “/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/main.py”, line 149, in run_prof
exec(code, globs, None)
File “main.py”, line 110, in
main()
File “main.py”, line 105, in main
train(args, model, device, train_loader, optimizer, epoch)
File “main.py”, line 29, in train
for batch_idx, (data, target) in enumerate(train_loader):
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 264, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 264, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File “/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py”, line 77, in getitem
img = self.transform(img)
File “/opt/conda/lib/python3.6/site-packages/torchvision/transforms/transforms.py”, line 49, in call
img = t(img)
File “/opt/conda/lib/python3.6/site-packages/torchvision/transforms/transforms.py”, line 143, in call
return F.normalize(tensor, self.mean, self.std)
File “/opt/conda/lib/python3.6/site-packages/torchvision/transforms/functional.py”, line 167, in normalize
for t, m, s in zip(tensor, mean, std):
File “/opt/conda/lib/python3.6/site-packages/torch/tensor.py”, line 361, in
return iter(imap(lambda i: self[i], range(self.size(0))))
RuntimeError: /pytorch/torch/csrc/autograd/profiler.h:53: out of memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/opt/conda/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/opt/conda/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/main.py”, line 280, in
main()
File “/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/main.py”, line 261, in main
autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
File “/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/main.py”, line 155, in run_autograd_prof
result.append(run_prof(use_cuda=True))
File “/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/main.py”, line 149, in run_prof
exec(code, globs, None)
File “/opt/conda/lib/python3.6/site-packages/torch/autograd/profiler.py”, line 191, in exit
records = torch.autograd._disable_profiler()
RuntimeError: /pytorch/torch/csrc/autograd/profiler.h:53: out of memory

References
mnist example
https://github.com/pytorch/examples/blob/master/mnist/main.py
bottleneck
https://pytorch.org/docs/stable/bottleneck.html

I think this problem is related to your other thread.

Thank you for commenting.
In this case, the GPU check works fine. But it failed in the MNIST learning step.

It seems your GPU is still out of memory.
Could you check its memory usage with nvidia-smi?

I check nvidia-smi output during MNIST training but up to 25% (not fully allocated).
And I found following things,
1)MNIST learning with 2epochs works fine under bottleneck profiler.
2)MNIST learning with 3epochs does outputs following error under bottleneck profiler.
(I execute it two times, the trace of out of memory is same.)

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 149, in run_prof
    exec(code, globs, None)
  File "main.py", line 109, in <module>
    main()
  File "main.py", line 104, in main
    train(args, model, device, train_loader, optimizer, epoch)
  File "main.py", line 29, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 264, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 264, in <listcomp>
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 77, in __getitem__
    img = self.transform(img)
  File "/opt/conda/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 49, in __call__
    img = t(img)
  File "/opt/conda/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 143, in __call__
    return F.normalize(tensor, self.mean, self.std)
  File "/opt/conda/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 167, in normalize
    for t, m, s in zip(tensor, mean, std):
  File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 361, in <lambda>
    return iter(imap(lambda i: self[i], range(self.size(0))))
RuntimeError: /pytorch/torch/csrc/autograd/profiler.h:53: out of memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 280, in <module>
    main()
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 261, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 155, in run_autograd_prof
    result.append(run_prof(use_cuda=True))
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 149, in run_prof
    exec(code, globs, None)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/profiler.py", line 191, in __exit__
    records = torch.autograd._disable_profiler()
RuntimeError: /pytorch/torch/csrc/autograd/profiler.h:53: out of memory

I watch RSS in top.(not nvidia-smi)
RSS value is increased to full, then previous call trace appeared.
Also, I found “main.py --no-cuda” execute many times(like 3 times) under bottleneck profiler.
Is there any good solution to solve this?

$ time python -m torch.utils.bottleneck main.py --no-cuda

RSS value is increased

top - 09:42:05 up 1 day,  9:58,  0 users,  load average: 1.42, 2.43, 2.34
Tasks:  15 total,   2 running,  13 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.8 us,  4.6 sy,  0.7 ni, 68.8 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 16025224 total,  2487640 free, 11505224 used,  2032360 buff/cache
KiB Swap: 16379900 total,  6105128 free, 10274772 used.  3980392 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  2319 jovyan    20   0 23.346g 9.652g 318348 R 100.3 63.2  16:09.42 python
     1 root      20   0    4364      0      0 S   0.0  0.0   0:03.41 tini
     7 root      20   0   49152    180    176 S   0.0  0.0   0:00.00 sudo
    16 jovyan    20   0 2110016  25644   4456 S   0.0  0.2   0:51.18 jupyterhub-sing
    24 jovyan    20   0   20120    132    128 S   0.0  0.0   0:00.35 bash
    63 jovyan    20   0   20120     24     24 S   0.0  0.0   0:00.37 bash
   153 jovyan    20   0   20120    136    132 S   0.0  0.0   0:00.43 bash
   187 jovyan    20   0 2153728   7640    900 S   0.0  0.0   0:05.49 python
   262 jovyan    20   0   20132   2388   1996 S   0.0  0.0   0:00.59 bash
   539 jovyan    20   0   20120    256    252 S   0.0  0.0   0:00.43 bash
  1119 jovyan    20   0   20120    284    280 S   0.0  0.0   0:00.43 bash
  1286 jovyan    20   0   24332    300    296 S   0.0  0.0   0:00.00 git
  1287 jovyan    20   0    8376      0      0 S   0.0  0.0   0:00.00 pager
  1404 jovyan    20   0   20120   2248   1868 S   0.0  0.0   0:00.54 bash
  2352 jovyan    20   0   40392   3600   3128 R   0.0  0.0   0:00.28 top

@ptrblck
I have one question and one request. Would you check it?

  1. Question about Memory Usage for torch.utils.bottleneck with MNIST.
    Training MNIST with --epochs 3 --no-cuda works fine, but failed with --epochs 4 --no-cuda on my server with 16GB-RAM machine. It seems like RAM memory (RES) consumption problem. I guess it needs 64GB-RAM for MNIST Training sample with torch.utils.bottleneck.
    Is my guess correct?

  2. As for the document on torch.utils.bottleneck
    It is helpful to write a document for trace customization (topk etc).

cp torch/utils/bottleneck/__main__.py bottleneck.py
pytorch bottleneck.py /script/path/to/source/script.py [args]

Try to run the code with --epochs 1 and see the memory usage. It should fit on your machine.
I don’t quite get your second point. Would you like to use topk in the output of the profiling?

@ptrblck Thank you for commenting.It is helpful.

  1. For MNIST epochs=1 with no-cuda, RES=4.8GB (max).

    If I want to reduce the memory usage by profile,
    I need to add following DONT_PROFILE and build again?
    Or other workaround method exist?

    https://github.com/pytorch/pytorch/blob/v0.4.0/tools/autograd/gen_variable_type.py#L53

  2. Yes, This is nice to have comment.
    It is helpful for documenting how to customize profiler output for PyTorch users.
    My intention is it does not need to remake the package, just copy locally.

P.S.
If I profile the MNIST learning with cProfile i.e. without autograd profile, the process memory consumption keeps around 2GB of RAM. (for epochs 10 (default value)).
It seems autograd profiler’s Event memory consumption is high.