What is the reason for pytorch's GPU memory "floor"?

Not a blocking issue or anything, but I was curious why there seems to be a minimum amount of gpu memory used by pytorch.

For example, if I run x = th.randn(1).cuda(1) my gpu memory usage for that process goes from zero to 723 MiB, when that tensor should only require 32 bits of memory.

The first CUDA call creates the CUDA context, which uses the memory on the device.