Hi,
I am loading this embedding model on to the GPU memory.
The model has total Parameters: 335,141,888
The precision that I am using to load the model is bfloat16.
When I check the memory from nvidia-smi, it is using approximately 1.25GB .
I was expecting to model to use 0.625 GB (model parameters x precision, 2 bytes for bf16)
The code is:
from transformers import AutoModel
model_name = “BAAI/bge-large-en-v1.5”
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(“cuda”)
Why the model is using using this additional memory?
Appreciate any help.