Inference Code Optimizations+ DataLoader

Currently you are performing all the preprocessing of your data in the loop over your DataLoader. It should be faster, if you move it to your Dataset's __getitem__ method and use multiple workers to load your data batches.
Could you try that and see if it’s faster?
Let me know, if that works for you.