Hello,Guys.
I am currently stuck in a problem where I have multiple gpus with 11G memory, but the model and optimizer I use is pretty huge and complicated. So even set the batch size == 1, it will still yield Cuda Out of Memory.
What can I do to figure this out?
Can I put different parts of the model in different gpu? Because I notice that most of the memory consumption consists of model parameters, grads and optimizer inner state.
After load the model to gpu , it consumes 3G memory. But I used 2 optimizer for different part of the model. After the first loss.backward() and optimizer.step() with batch_size == 1 ,the single gpu is almost fully occupied.
Or how can I restore the gpu memory to 3G after first optimization? I tried to delete the loss and zero_grad the model, but it still does not work : (
Looking forward to your reply!