Hello,
I have a question regarding running inference on a model loaded in memory
I initialize the model, run .benchmark
(if GPU) and .eval()
then do inference 2 times.
Time: GPU_1 = 5sec
Time: GPU_2 = 0.05sec
Time: CPU_1 = 4sec
Time: CPU_2 = 4sec
Now in tensorflow using tf.session
the CPU_2 is decreased by a large margin as well.
Is there’s anything I can do to “cache” an inference model in the CPU like GPU does?
Thanks in advance,
John