During my optimization loop I have the need to visualize a lot of my training output using matplotlib and other tools. Hence, I need to copy data from GPU to CPU and subsequently use matplotlib to plot (and save) data which takes quite a while.
This became so exhaustive that a large amount of time is now spent on plotting these intermediate results while the GPU utilization drops down to 0. I’ve been looking into flags like non_blocking, pin_memory etc., but they seem to focus mostly on CPU → GPU memory transfers.
Is there any way that plotting and training can be done asynchronously ? Ideally, I’d like to continue training on the GPU while the CPU is used to do “minor tasks” like plotting, saving on disk etc.
you’ll have to program this, preferably as off-process jobs. for plotting, tensorboard is an example of this approach; if your plots are simple line plots, you can use it (pytorch has a supporting module).
Are there any examples how this could be done with off-process jobs ? I have never implemented something like this in Python.
For the type of logging information: I have a few line plots, but I’m also in need to show a bunch of images (plt.imshow and subplots). I already have Tensorboard-logging in place, but I’d like to have an additional source of intermediate output information.
well, there is a lot of ways to “send off” data arrays, the choice also depends on your “renderer”, i.e. browser/gui thread loop, incremental/reprocessing etc.
you can try just doing matplotlib stuff in a python thread, I’m not sure how badly this will affect training (python’s global lock is the potential problem).