Hi, I am having issue with backprop gradient that eats up a lot of GPU memory.
Note: I tried to decrease
BATCH_SIZE , but it does not help.
Someone told me to use deepspeed zero offload.
However, my code is quite similar to some GNN structure : NN_output = graph.forward(NN_input, types=“f”)
So, outputs = model_engine(inputs) seems does not really fit in my case ?
args also does not follow such code styling.
Any idea ?