Not a blocking issue or anything, but I was curious why there seems to be a minimum amount of gpu memory used by pytorch.
For example, if I run x = th.randn(1).cuda(1)
my gpu memory usage for that process goes from zero to 723 MiB, when that tensor should only require 32 bits of memory.