Automatic Mixed Precision increases max memory used by tensors

Hi. I ran the same script, and it does go out of memory with AMP, while it doesn’t with FP32. Comparing the allocated memory, AMP only reduces it by 4 MB (less than 1%).

I started a new topic at Increased memory usage with AMP but I can move it here if required. Thanks in advance.