Training metrics on GPU: good habits


This is a curious post more than a asking one. I just want to know how do you record the metrics of your training when you’re using a GPU. I’m use to print the metrics of each epoch is a separated file. I open the file at each epoch and print train loss, test loss and so on. I know this is highly unefficient, because I’m opening a file and printing at every epoch, but with small datasets and models I really don’t care, the difference is not that high.

But when working with GPU and huge models and datasets, there’s a huge difference because the computations are so fast the the performance bottleneck is precisely on the metrics printing and so on. I’m just curious to know how do you solve this problem, do you just plot at the end the curves? Do you just write everything at the end?

Thank you so much :slight_smile: