High memory consumption when moving weights to another device

I’ve got a strange behavior moving a network from ‘cpu’ device to ‘cuda’.
When it happens there’s a huge allocation of RAM which is not appropriate for me…

I made a snapshot of memory allocation:

Line # Mem usage Increment Line Contents

 21    227.0 MiB    227.0 MiB       @profile
 22                                 def __init__(self):
 23    227.0 MiB      0.0 MiB           ckpt_dir = 'hair_extraction/pytorch_hair_segmentation/models/pspnet_squeezenet_sgd_lr_0.002_epoch_46_test_iou_0.882.pth'
 24    227.0 MiB      0.0 MiB           network = 'pspnet_squeezenet'
 25    230.3 MiB      3.3 MiB           self.device =  'cuda' if torch.cuda.is_available() else 'cpu'
 26
 27    230.3 MiB      0.0 MiB           assert os.path.exists(ckpt_dir)
 28
 29    230.3 MiB      0.0 MiB           self.test_image_transforms = std_trnsf.Compose([
 30    230.3 MiB      0.0 MiB               std_trnsf.ToTensor(),
 31    230.3 MiB      0.0 MiB               std_trnsf.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
 32                                         ])
 33    248.0 MiB     17.8 MiB           self.net = get_network(network)
 34   2942.6 MiB   2694.6 MiB           self.net.to(self.device)
 35   2952.6 MiB     10.0 MiB           state = torch.load(ckpt_dir, map_location=torch.device(self.device))
 36   2952.6 MiB      0.0 MiB           self.net.load_state_dict(state['weight'])
 37   2952.6 MiB      0.0 MiB           self.net.eval()

as can be seen on line 34 there’s a big allocation of memory…
buf when I explicitly define device as ‘cpu’ things go in the right way:

Line # Mem usage Increment Line Contents

21    226.4 MiB    226.4 MiB       @profile
22                                 def __init__(self):
23    226.4 MiB      0.0 MiB           ckpt_dir = 'hair_extraction/pytorch_hair_segmentation/models/pspnet_squeezenet_sgd_lr_0.002_epoch_46_test_iou_0.882.pth'
24    226.4 MiB      0.0 MiB           network = 'pspnet_squeezenet'
25    226.4 MiB      0.0 MiB           self.device =  'cpu'
26
27    226.4 MiB      0.0 MiB           assert os.path.exists(ckpt_dir)
28
29    226.4 MiB      0.0 MiB           self.test_image_transforms = std_trnsf.Compose([
30    226.4 MiB      0.0 MiB               std_trnsf.ToTensor(),
31    226.4 MiB      0.0 MiB               std_trnsf.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
32                                         ])
33    243.4 MiB     17.1 MiB           self.net = get_network(network)
34    243.8 MiB      0.4 MiB           self.net.to(self.device)
35    256.3 MiB     12.5 MiB           state = torch.load(ckpt_dir, map_location=torch.device(self.device))
36    256.3 MiB      0.0 MiB           self.net.load_state_dict(state['weight'])
37    256.3 MiB      0.0 MiB           self.net.eval()

how it can be possible?