Will multiple GPUs increase the memory limit (allow deeper models)?

I have a GRU model and the depth of my model is limited by my GPU’s memory. Would having two of the same GPU’s allow for twice the depth? Could I also use my SSD or RAM as memory instead (without losing GPU processing)?

In case it is case specific; I have a 2-layer GRU model with 1000 inputs and 500 hidden units (thats my current limit) and would like to increase it to 1000 hidden units. Also I’m aware that reducing batch size would help, but due to class imbalance I can’t reduce it.

You might want to look into model parallel
model parallel best practices

1 Like

Thanks for the link, it answers my initial question. The pytorch pages I found earlier didn’t give me a definitive answer and here it clearly states that It can solve memory issues.

Although I’m still wondering about the 2nd part of my question regarding using SSD or RAM as a solution.

To clarify, are you wondering if you can store modules on RAM, and then load them as you need to? For example, if a two layer model is too massive, storing layer1 on the GPU and layer2 on ram?

Not really is the short answer. The modules need to be on the GPU to allow for the GPU to perform the forward and backward pass. You could do a thing where once the forward pass is completed on layer1, move it to RAM and move layer2 to the GPU, and continue the forward pass. But this will be quite slow to move between memory every forward pass (and I’m not sure how happy autograd will be about this).

However, you can just keep part of the model on the CPU.. It won’t be as fast as having it all on a GPU, but it’s a possible route.

1 Like

you can have part of you model in gpu and other part in RAM. But as far as I know SSD storage is not supported and probably will be much slower than RAM.

partial GPU partial CPU model

I would stick to models that completely fit within GPU vram since the cpu operations could be a huge bottleneck since CPUs can’t get nearly as much parallelism as GPUs. Also it could get tricky, loading the model during inference time where some weight tensors are GPU and some are CPU.

1 Like