I have a question regarding running inference on a model loaded in memory
I initialize the model, run
.benchmark (if GPU) and
.eval() then do inference 2 times.
Time: GPU_1 = 5sec
Time: GPU_2 = 0.05sec
Time: CPU_1 = 4sec
Time: CPU_2 = 4sec
Now in tensorflow using
tf.session the CPU_2 is decreased by a large margin as well.
Is there’s anything I can do to “cache” an inference model in the CPU like GPU does?
Thanks in advance,