Hello, I have a pretty odd request.
I have a pretty simple program that uses huggingface and nn.DataParallel to finetune a model using 4 GPUs. (Please correct me if I’m wrong) During the forward, backward, and the optimizer update steps, I would like to add busy waiting while GPUs wait on synchronization. Essentially the goal is to maintain each GPU’s utilization so it stays at 100%. After doing some investigation (as outlined in Microsoft’s paper), I see that util and power both take a dip during synchronization phases. I know this seems a bit odd, but it’s for a larger project that I am working on.
So how can I launch a kernel to do some busy waiting on the GPUs during synchronization?