Premise
I have a rather simple/small, trained convolutional autoencoder model.
I want to put my test data through the model but I have a lot of test data. (No seriously… I’m talking ~280 million inputs to test, so it’s slow… even with batches).
My rather vague question is this:
What are the various clever PyTorch options for speeding up the model inference stage?
Details
Straight off the bat there are some basic/easy ones that we all know:
- Batch loading inputs during testing
- Using a GPU in testing mode (if the .to(device) step is not a limiting factor)
But are there more approaches which are more focused on deeper PyTorch functionalities?
e.g. If there was a big red button that allowed you to duplicate the model across the maximum GPU/CPU memory allocation and then fire batches off in an efficient way, or some similar approaches… that’s the sort of thing I’m angling at.
Alternatively, if that big red button doesn’t exist, what would be the next best thing? I assume running in CPU mode and instantiating multiple models across cores in a parallelized way would also work (but that would be an MPI thing rather than a PyTorch thing).
Sorry it’s such a vague question, but hopefully it sparks a small discussion and others can use this as a place to see ways to speed up their model inference if they become stuck too!
Many thanks in advance