How model is loaded on gpu?

I’m testing loading same model in gpu like this.

import model
import numpy as np

tt = np.zeros((512, 512, 3), dtype=np.uint8)
ms = {}
while True:
	for i in range(300):
		if not ms.get(i):
			ms.update({i:model.Model()})
			for k,v in ms.items():
				v.infer(tt)
			print(i, 'loaded')
			print(len(ms), 'length')

When I load just 1 model, it occupies 1G of GPU mem.
But when i load 160 model like above it occupies just 16G of gpu mem. why usage of gpu mem is not increasing linearly?

The first CUDA call will initialize the CUDA context, which will use some memory on your device (depending on the CUDA version, used GPU etc.).
You could check the memory usage in PyTorch using print(torch.cuda.memory_allocated()).