I notice that the training time of first epoch is some kind longer than the following epochs by using the code of tensorflow version. I guess it is because tensorflow has cached all batchs of data in the first epoch, then the processing of the following epochs can be very quick?
Can I reimplement this operation with pytorch? It may reduce the training time in total.
Thanks for your advice.