How pytorch consumes GPU

How pytorch’s GPU-consuming mechanism works ? When loading a small model (64M storage) during testing, it takes up a lot of GPU (870M GPU).
The architecture of the model is as follows:

rep_linear.weight    torch.Size([768, 21128])
rep_linear.bias    torch.Size([768])
LayerNorm.weight    torch.Size([768])
LayerNorm.bias    torch.Size([768])
classifier.weight    torch.Size([119, 768])
classifier.bias    torch.Size([119])

Therefore, I am very confused why does it take up so many GPU resources (nearly ten times)?

Also, I put an embeddings (61M storage) on the GPU and the incredible thing happened as well.

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch
torch.nn.Embedding(20000, 768).cuda().requires_grad_(False)
Embedding(20000, 768)
20000*768*4 / 1000 /1000 = 61MB


Why does this happen?
I hope you will give your opinions. Thanks.

In addition, for the same matrix, I found that different pytorch versions allocate different GPU sizes.

However, I find that a small tensor will also take up a lot of gpu space.


SO, cuDNN library occupies around 750Mb memory. detail