Should I pre-allocate all cuda tensors?

In lua torch, I needed to preallocate all cuda tensors, in order to:

  • avoid sync points, associated with allocation
  • avoid running out of memory…

Is this still a requirement/recommendation for pytorch?

(I’m getting oom errors using LSTM. not sure if this is because I need to pre-allocate stuff, or … ? )

No. :grinning:

In PyTorch:

  • Freeing CUDA tensors does not synchronize because the caching allocator holds onto and reuses the memory segment.
  • Tensors are freed immediately when they go out of scope (because of Python’s ref-counting). In Lua Torch, tensors were not freed until the garbage collector ran.