Memory difference depending on whether the tensor was creating on gpu or pushed to gpu? Strange

The del don’t make any change but the empty_cache() does something.
Basically it agressively frees up memory (slowing down the process) and changing the memory fragmentation. If you’re lucky, this fragmentation will be less and you won’t OOM.