GPU memory usage problem?

raymongL · January 15, 2020, 9:10am

Here is my code:

gpu1 = 0
gpu2 = 1
net1 = models.resnet50(pretrained=True)
net1.cuda(gpu1)
net1.load_state_dict(...)
net1.eval()
myLoader = Dataloader(...)
net2 = models.resnet50(pretrained=True)
net2.cuda(gpu2)

So i got a 4 GPUs server (id = 0, 1, 2, 3 respectively), and i designated two GPUs for net1 and net2. But somewhere after net1.cuda(gpu1) and before net2.cuda(gpu2), a few memory of (gpu_id = 2) is used. In my case, (gpu_id = 2) has 881M memory occupied.

Can i get a clue on what’s going on? This did not happen when i run the same codes on another dataset.

ptrblck · January 16, 2020, 6:11am

Was the state_dict, which is loaded in net1 saved on GPU2?
Do you see the same memory usage, if you comment out the net1.load_state_dict() and DataLoader creation lines of code?

raymongL · January 16, 2020, 6:49am

You are right. I comment ‘net1.load_state_dict()’, and the memory usage is gone.
So i guess pytorch use GPU2 for load_state_dict() by default?
How can i do the load_state_dict() without usage of irrelevant GPUs?
But why this did not happen on other datasets?

ptrblck · January 16, 2020, 6:53am

Could you check the device of the parameters in side this state_dict and then try to load the state_dict via map_location='cpu'?
I don’t think this behavior is related to the dataset, but more likely to the previous training run and storage of the state_dict, but I’m also guessing at the moment.

raymongL · January 16, 2020, 7:50am

Yeah, i print the parameters of state_dict, and they are on “device:2”.
I use map_location=‘cpu’ and it solved my problem, thanks!
I think if load_state_dict() is by default load to cpu, sounds more reasonable to me.
Thanks a lot!