I’m confused about the meaning of “performance” in this context. Does it refer to the model’s overall accuracy, or specifically its processing speed?
I noticed that calling tensor.contiguous() after using permute eliminated a warning message, but unfortunately, it also caused the model to run significantly slower. What’s the explanation for this trade-off?
Performance is meant here as in speed, not model convergence/accuracy.
You should not see any performance difference by manually calling contiguous or letting the reducer do it as explained here.