Recently, I probed into pytorch memory management mechanism and found that memory blocks were allocated seperately under different streams, which I think will cause memory over-allocation in some cases.
I wonder what developers design such mechanisms and what factors they consider
Thanks and best wishes