In lua torch, I needed to preallocate all cuda tensors, in order to:
- avoid sync points, associated with allocation
- avoid running out of memory…
Is this still a requirement/recommendation for pytorch?
(I’m getting oom errors using LSTM. not sure if this is because I need to pre-allocate stuff, or … ? )