PyTorch Startup Memory Occupation

Hi,

I have a simple linear network class in pytorch with 6000 hidden units.
When I call the following lines, it occupies 893MB memory.
device = torch.device(“cuda”)
model = DeepNN().to(device)

where DeepNN is the class name. This initial memory will increase by increasing the number of hidden units.
Wondering what this means and is there any way to fix it?

Update: To give some more information, with a network with nn.Linear(6000,6000) , I expect the initial memory to be 144MB but it is currently 600MB. When I increase the size of hidden units to nn.Linear(30000,30000), I expect the memory to be around 3.5GB but it preoccupies 5GB.

Basically the bigger your net is the more memory it requires.

I understand. My problem is that why it preoccupies the memory before running the actual computations? Is that normal?
Even with a one layer network, it constantly occupies this amount of memory.

Cuda requires some memory to initialize libraries. It depends on gpu model. Around 600 Mb

Thanks for the reply.
What I see is that the memory preoccupied keeps increasing by increasing the number of hidden units. With 20000 hidden units, it occupies around 3GB memory.
Is there any way to avoid it? I am basically loosing memory required for computation.

The parameters have to be stored on the GPU so that the computation itself can be performed on the device.
As @JuanFMontesinos said, the CUDA context will use some memory besides that.

If it is cuda context memory, why it is related to the number of hidden units? If it is for parameters, the numbers I am getting is not what calculations say. I edited my question with some examples.

Thanks for the additional information!
Could you run the following code and check the memory usage:

import torch
import torch.nn as nn

torch.cuda.synchronize()  # Create CUDA context (947MB in my case)

model = nn.Linear(6000, 6000).to('cuda')
expected_mem = (model.in_features * model.out_features + model.out_features) * 4 / 1024**2
print('Max allocated {:.3f}MB, expected {:.3f}MB'.format(
    torch.cuda.max_memory_allocated() / 1024**2,
    expected_mem))

model = nn.Linear(30000, 30000).to('cuda')
expected_mem = (model.in_features * model.out_features + model.out_features) * 4 / 1024**2
print('Max allocated {:.3f}MB, expected {:.3f}MB'.format(
    torch.cuda.max_memory_allocated() / 1024**2,
    expected_mem))

> Max allocated 138.023MB, expected 137.352MB
> Max allocated 3572.138MB, expected 3433.342MB

On my system, the CUDA context uses a constant memory of ~950MB, while the model parameters fill up additional memory.

Thanks. Problem solved. There was a bug in model definition.