CPU time vs. GPU time of ops in Pytorch

I am trying to profile a network with torch.autograd.profiler and I need some explanation regarding the CPU and GPU time reported. I assume that the timings are nearly equal because CPU time includes the time the kernel launch + execution. However I see inconsistency when trying to find a relationship between CPU and GPU times.

As shown below some ops report approximately same time , some ops report CPU time larger than GPU time and some have the CPU time smaller than GPU time. Could someone please explain the difference in time.

Name CPU Time GPU Time
relu 14.700us 15.936us
sub 112.447us 93.504us
mm 43.501us 46.912us
CatBackward 84.912us 84.704us