Cuda out of memory question

My network has roughly 10 million parameters, 9508738 to be specific. Assuming float32 for each parameter, this amounts to about 40 MB memory use for the networks parameters.

However, when I try to increase my batch size from 5 to 10 I get the error

RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 10.91 GiB total capacity; 8.84 GiB already allocated; 451.00 MiB free; 378.00 KiB cached)

I am using a a 1080Ti and currently dont understand what takes up the entire GPU memory. nvidia-smi idles at arount 300MB when no training is performed so no other application takes up considerable space either.

Help would be appreciated, also I dont get the meaning of these lines that I found in some code

net = torch.nn.DataParallel(net, device_ids=range(torch.cuda.device_count()))
        cudnn.benchmark = True

Please elaborate, thank you in advance

Most of the memory is used for storing activations at each layer of your model because these activations are required for computing gradients during the backward pass. This is usually way larger than the size of your model parameters.

nn.DataParallel is used to split your computation among multiple GPUs. Your batch is split among the GPU devices specified.

cudnn.benchmark is for choosing algorithms for performing the layer operations (eg. convolution). It tries to find the fastest algorithm for the model in questions but at the expense of some memory. Setting it to False should reduce the GPU memory needed by a little.

thank you. I am wondering if I need to call detach() and other methods the like manually in my mein training routines in order to free up memory or is that handled automatically?