How do pytorch users restore visualizations from a checkpoint after stopping training for a while?

leeks · March 12, 2018, 1:41pm

I know in TF if we use a Supervisor, it helps to save all results and restore the visualizations from the point we stop the training, using a global step variable. In pytorch there’s a pytorch-tensorboard library available, but there doesn’t seem to be any support for something like the Supervisor, so I’m wondering what do pytorch users generally do when you want to visualize the accuracy curves etc from the start of the training until the end using the tensorboard library, after training has stopped (e.g. in the event of a system crash). – how do you “resume” the visualizations?

Alternatively, do you guys find tensorboard useful at all? At this point I’m quite inclined to get away from the usual TF stuff and simply parse the scalar JSON data manually to get the visualizations I want (matplotlib looks very pretty with seaborn too). There’s more control this way, although there’s more work to do.

I think it would be interesting if there’s a “best practices standard” available or if you guys could share what worked the best for you. Thank you!

ptrblck · March 12, 2018, 2:02pm

I cannot comment on Tensorbord, since I was just using it for a short period of time when tensorflow was released.
At the moment I’m using Visdom to visualize the loss and possible prediction images.
It’s maybe not as beautiful / organized as Tensorboard, but to be honest, I still don’t know what I’m supposed to see in e.g. the Tensorboard histograms.

Have a look at the visdom examples. They work with tensors as well as numpy arrays.

niaoyu · March 12, 2018, 6:32pm

Emmm I am,not sure what do you mean about tensorboard. For me tensorboard will store and keep the result scalars unless you delete it.
And you could check the visualization whenever you use ‘tensorboard --logdir runs’