I have a simple linear network class in pytorch with 6000 hidden units.
When I call the following lines, it occupies 893MB memory.
device = torch.device(“cuda”)
model = DeepNN().to(device)
where DeepNN is the class name. This initial memory will increase by increasing the number of hidden units.
Wondering what this means and is there any way to fix it?
Update: To give some more information, with a network with nn.Linear(6000,6000) , I expect the initial memory to be 144MB but it is currently 600MB. When I increase the size of hidden units to nn.Linear(30000,30000), I expect the memory to be around 3.5GB but it preoccupies 5GB.
I understand. My problem is that why it preoccupies the memory before running the actual computations? Is that normal?
Even with a one layer network, it constantly occupies this amount of memory.
Thanks for the reply.
What I see is that the memory preoccupied keeps increasing by increasing the number of hidden units. With 20000 hidden units, it occupies around 3GB memory.
Is there any way to avoid it? I am basically loosing memory required for computation.
The parameters have to be stored on the GPU so that the computation itself can be performed on the device.
As @JuanFMontesinos said, the CUDA context will use some memory besides that.
If it is cuda context memory, why it is related to the number of hidden units? If it is for parameters, the numbers I am getting is not what calculations say. I edited my question with some examples.