Confusion of CPU/GPU memory use

On linux,
I make a quite simple feed-forward network which consist of some fc/batchnorm/activation layer.
I then save the model, the *.pt or *.pth file is about 40K given by du model.pt -sh.

In deployment, I load the model just as recommended:
model = NN_model.NeuralNet()
model.load_state_dict(torch.load(model_path).state_dict())
Confusion here: Whether on CPU or CUDA, I noticed that the memory usage is about 1.5G on CPU memory given by free -h and ~900MB on GPU given by nvidia-smi.