Use of models on devices with shared cpu/gpu memory

I am having some difficulties loading models on my device, it’s an Nvidia Xavier NX, and when I try to load a model to gpu there is a very large memory spike, over double of when the model finishes loading.

The only thing I can think of that would cause this is the model being loaded to cpu memory, then being copied to gpu memory, and this being the same memory device it temporarily has the model loaded to ram twice.

Is this the cause or something else? And is there a way I can work around it?