Using matplotlib during training ( Avoiding overhead due to processing on CPU)

During my optimization loop I have the need to visualize a lot of my training output using matplotlib and other tools. Hence, I need to copy data from GPU to CPU and subsequently use matplotlib to plot (and save) data which takes quite a while.

This became so exhaustive that a large amount of time is now spent on plotting these intermediate results while the GPU utilization drops down to 0. I’ve been looking into flags like non_blocking, pin_memory etc., but they seem to focus mostly on CPU → GPU memory transfers.

Is there any way that plotting and training can be done asynchronously ? Ideally, I’d like to continue training on the GPU while the CPU is used to do “minor tasks” like plotting, saving on disk etc.

Thanks

you’ll have to program this, preferably as off-process jobs. for plotting, tensorboard is an example of this approach; if your plots are simple line plots, you can use it (pytorch has a supporting module).

Are there any examples how this could be done with off-process jobs ? I have never implemented something like this in Python.

For the type of logging information: I have a few line plots, but I’m also in need to show a bunch of images (plt.imshow and subplots). I already have Tensorboard-logging in place, but I’d like to have an additional source of intermediate output information.

well, there is a lot of ways to “send off” data arrays, the choice also depends on your “renderer”, i.e. browser/gui thread loop, incremental/reprocessing etc.

you can try just doing matplotlib stuff in a python thread, I’m not sure how badly this will affect training (python’s global lock is the potential problem).