Understanding Profiler Output

I’ve got the below profiler output. I’m fairly comfortable with what each of the columns represent, however I’m slightly confused as to why aten::convolution_backwardappears to be the most time costly operation for both the GPU and CPU.

Name                        Self CPU     CPU total      Self CUDA      CUDA total
aten::convolution_backward  347.071ms    42.355s        105.888s       105.549s

I understand that self CPU is time spent directly within this operator’s code, excluding function calls, which infers that given the different between the self CPU and CPU total values, aten::convolution_backwardmakes some function calls that are executed on the CPU. I’m just a bit confused as I would have assumed the backward call would have been all on the GPU, given that’s where the model and the data is?