What dose PyTorch do when initializing CUDA context?

I wonder how PyTorch consumes GPU memory when initializing CUDA context? I did not find any materials to explain this, but it can consume about 700MiB device memory in the experiments.