Embedding model GPU memory usage

Hi,

I am loading this embedding model on to the GPU memory.

The model has total Parameters: 335,141,888
The precision that I am using to load the model is bfloat16.
When I check the memory from nvidia-smi, it is using approximately 1.25GB .

I was expecting to model to use 0.625 GB (model parameters x precision, 2 bytes for bf16)

The code is:
from transformers import AutoModel
model_name = “BAAI/bge-large-en-v1.5”
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(“cuda”)

Why the model is using using this additional memory?
Appreciate any help.

Parameters are stored in float32, which matches the memory usage, unless you explicitly cast them to bfloat16. I don’t know what AutoModel does.