ATen memory allocation for intermediate tensors

I am looking at the example here. It is very convenient of course to use things like zeros_like :+1:

auto new_h = at::zeros_like(old_cell);

This great when the allocated tensor needs to be returned, but what about intermediate variables? Should we call forward function with some pre-allocated buffer each time, or is it better to use a global memory? or is it still efficient to use empty_like/zeros_like?

What is the recommended way for a large intermediate tensor?

The backend use a custom cuda allocator. So allocating cuda tensors is very efficient if you do the same allocations repeatedly. So it’s ok for your forward to create new tensors.

1 Like