Profiling PyTorch: pred.cpu() is reponsible for 95% of runtime?

I posted a follow up question using the insides gained by additional profiling runs in a new thread.

Regarding forum etiquette. Is it generally preferred to post follow up questions in the same thread or to create a new thread for new questions?