Memory problem with bottleneck profiler

fiorenza2 · March 27, 2020, 11:16am

Hi there,

I have been trying to profile my code using torch.utils.bottleneck and received the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code                                                                                  
    exec(code, run_globals)
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 231, in <module>
    main()
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 210, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 102, in run_autograd_prof
    result = [run_prof(use_cuda=False)]
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 98, in run_prof
    exec(code, globs, None)
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/autograd/profiler.py", line 262, in __exit__
    records = torch.autograd._disable_profiler()
MemoryError: std::bad_alloc

Furthermore, it appears to run my script two times (I imagine for CUDA and CPU respectively), and upon completing the second run (successfully) this error gets printed.

ptrblck · March 28, 2020, 5:03am

Could you post a minimal code snippet to reproduce this error, please?

fiorenza2 · April 1, 2020, 11:30pm

Turns out that the code I was debugging was simply too complex for the profiler; I scaled it back considerably and it works fine.

RylanSchaeffer · September 1, 2021, 6:32pm

I have this problem. @fiorenza2 , what do you mean the code was too complex for the profiler? Could you clarify what you meant “scaled it back”?

fiorenza2 · September 1, 2021, 8:59pm

Hi there,

It’s been a while, but basically the code was loading very large tensors into memory (I believe this was model-based RL, so very large trajectory data). What believe I did was create a scaled down version which used much smaller hyperparameters (e.g., rollout horizon) that would reduce the memory usage. This meant that I was able to get meaningful breakdowns in where time was being taken up.