Hi, I am having issue with backprop gradient that eats up a lot of GPU memory.
Note: I tried to decrease BATCH_SIZE
, but it does not help.
Hi, I am having issue with backprop gradient that eats up a lot of GPU memory.
Note: I tried to decrease BATCH_SIZE
, but it does not help.
Someone told me to use deepspeed zero offload.
However, my code is quite similar to some GNN structure : NN_output = graph.forward(NN_input, types=âfâ)
So, outputs = model_engine(inputs) seems does not really fit in my case ? args
also does not follow such code styling.
Any idea ?