How to utilize cpu free time to improve whole traning speed

my code is(I used netD.cuda and netG.cuda) :
self.optimizer_D.zero_grad()
self.backward_D()
self.optimizer_D.step()

    self.optimizer_G.zero_grad()
    self.backward_G()
    self.optimizer_G.step()

After excuting self.optimizer_G.step(),GPU gets tasks and starts to calculate results,GPU will take 0.1s to complete tasks.However during this 0.1s,CPU is doing nothing.I want to know how to utilize this free CPU time to improve whole training speed.BTW:I have only one GPU in my computer.

Or is there another way to improve training speed in this single GPU case?Thanks a lot.

How about using Dataloader to load data using multiprocessing?