I am looking at the example here. It is very convenient of course to use things like
auto new_h = at::zeros_like(old_cell);
This great when the allocated tensor needs to be returned, but what about intermediate variables? Should we call
forward function with some pre-allocated buffer each time, or is it better to use a global memory? or is it still efficient to use
What is the recommended way for a large intermediate tensor?