Is there some tips or techniques for optimizing speed in pytorch?

My codes runs slowly, except for better gpus, is there any other ways to make codes run faster?
like how to load data faster?

A general rule is to avoid explicit for loops when possible.

When manipulating tensors for example, try to vectorize all you can by using the default pytorch functions instead of looping through dimensions.