Confusion on tensor's memory usage

The initialization will create the CUDA context loading all kernels for your GPU architecture and is thus expected. The size of the context depends on the CUDA version, your GPU, the number of kernels in loaded CUDA libs as well as native PyTorch kernels.
You could update to CUDA 11.7 and enable lazy module loading via CUDA_MODULE_LOADING=LAZY which will load kernels only if they are needed and will thus reduce the context size.