I’ve got the below profiler output. I’m fairly comfortable with what each of the columns represent, however I’m slightly confused as to why aten::convolution_backward
appears to be the most time costly operation for both the GPU and CPU.
Name Self CPU CPU total Self CUDA CUDA total
aten::convolution_backward 347.071ms 42.355s 105.888s 105.549s
I understand that self CPU is time spent directly within this operator’s code, excluding function calls, which infers that given the different between the self CPU and CPU total values, aten::convolution_backwardmakes some function calls that are executed on the CPU. I’m just a bit confused as I would have assumed the backward call would have been all on the GPU, given that’s where the model and the data is?