My network has roughly 10 million parameters, 9508738 to be specific. Assuming float32 for each parameter, this amounts to about 40 MB memory use for the networks parameters.
However, when I try to increase my batch size from 5 to 10 I get the error
RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 10.91 GiB total capacity; 8.84 GiB already allocated; 451.00 MiB free; 378.00 KiB cached)
I am using a a 1080Ti and currently dont understand what takes up the entire GPU memory. nvidia-smi idles at arount 300MB when no training is performed so no other application takes up considerable space either.
Help would be appreciated, also I dont get the meaning of these lines that I found in some code
net = torch.nn.DataParallel(net, device_ids=range(torch.cuda.device_count()))
cudnn.benchmark = True
Please elaborate, thank you in advance