At inference, when it comes to load / unload models very frequently, model loading time becomes crucial.
Right now, a decent model of 700MB takes up to 5 seconds to load from a SSD on a fast machine.
I was wondering if there could be some kind of optimization. (use of HDF5 for the model itself instead of pickle ?)
I would like to take it down to 1 second (as it is for Lua Torch …)
Cheers,
Vincent