Model parallelism in Multi-GPUs: forward/backward graph

rasbt · February 6, 2019, 11:26pm

I just realized that my answer was in a different context … regarding your question, I thought you were referring to Uneven GPU utilization during training backpropagation So my answer may be a bit out of context given the problem discussed in this thread