Mixed precision increases memory in meta-learning?

Could you update to the latest stable release or the nightly and rerun the test?
We hit a caching issue some time ago for linear layers (and fixed it). Based on the output you are seeing, I don’t think the problem is that AMP uses more memory in particular, but that the memory usage is clearly increasing/leaking.