AMP uses more GPU memory and slows training

tom · May 29, 2021, 8:35am

Thank you for double-checking. The other thread I’d recommend is

again @ptrblck 's a hero for looking at this.

About the memory: With AMP’s automatic casting so for very small networks, the memory cost of the tensors that are there twice (in FP32 originally and cast to FP16) might be larger than the benefit. (Assuming you use PyTorch’s memory counting, if you use nvidia-smi you’d might also run into caching allocator things.

Best regards

Thomas