Memory problem with bottleneck profiler

Hi there,

I have been trying to profile my code using torch.utils.bottleneck and received the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code                                                                                  
    exec(code, run_globals)
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 231, in <module>
    main()
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 210, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 102, in run_autograd_prof
    result = [run_prof(use_cuda=False)]
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 98, in run_prof
    exec(code, globs, None)
  File "/home/phil/.envs/rp1env/lib/python3.6/site-packages/torch/autograd/profiler.py", line 262, in __exit__
    records = torch.autograd._disable_profiler()
MemoryError: std::bad_alloc

Furthermore, it appears to run my script two times (I imagine for CUDA and CPU respectively), and upon completing the second run (successfully) this error gets printed.

Could you post a minimal code snippet to reproduce this error, please?

Turns out that the code I was debugging was simply too complex for the profiler; I scaled it back considerably and it works fine.

I have this problem. @fiorenza2 , what do you mean the code was too complex for the profiler? Could you clarify what you meant “scaled it back”?

Hi there,

It’s been a while, but basically the code was loading very large tensors into memory (I believe this was model-based RL, so very large trajectory data). What believe I did was create a scaled down version which used much smaller hyperparameters (e.g., rollout horizon) that would reduce the memory usage. This meant that I was able to get meaningful breakdowns in where time was being taken up.