[SOLVED] Make Sure That Pytorch Using GPU To Compute

I add my profiling results from torch.utils.bottleneck. I am not sure what can I read from the results, in particular if CUDA time is bigger than CPU time - does it mean GPU is utilised? Thanks!

--------------------------------------------------------------------------------
  autograd profiler output (CPU mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

---------  ---------------  ---------------  ---------------  ---------------  ---------------
Name              CPU time        CUDA time            Calls        CPU total       CUDA total
---------  ---------------  ---------------  ---------------  ---------------  ---------------
stack        1995016.802us          0.000us                1    1995016.802us          0.000us
stack        1433562.687us          0.000us                1    1433562.687us          0.000us
stack        1418816.239us          0.000us                1    1418816.239us          0.000us
stack        1208400.125us          0.000us                1    1208400.125us          0.000us
stack        1109156.949us          0.000us                1    1109156.949us          0.000us
stack        1043755.894us          0.000us                1    1043755.894us          0.000us
stack         989006.451us          0.000us                1     989006.451us          0.000us
stack         988511.989us          0.000us                1     988511.989us          0.000us
stack         984434.292us          0.000us                1     984434.292us          0.000us
stack         980338.307us          0.000us                1     980338.307us          0.000us
stack         976940.691us          0.000us                1     976940.691us          0.000us
stack         955838.942us          0.000us                1     955838.942us          0.000us
stack         955763.458us          0.000us                1     955763.458us          0.000us
stack         952211.930us          0.000us                1     952211.930us          0.000us
stack         951751.424us          0.000us                1     951751.424us          0.000us

--------------------------------------------------------------------------------
  autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

	Because the autograd profiler uses the CUDA event API,
	the CUDA time column reports approximately max(cuda_time, cpu_time).
	Please ignore this output if your code does not use CUDA.

---------  ---------------  ---------------  ---------------  ---------------  ---------------
Name              CPU time        CUDA time            Calls        CPU total       CUDA total
---------  ---------------  ---------------  ---------------  ---------------  ---------------
stack        1348676.702us    1348687.500us                1    1348676.702us    1348687.500us
stack        1325784.279us    1325796.875us                1    1325784.279us    1325796.875us
stack        1301842.419us    1301843.750us                1    1301842.419us    1301843.750us
stack        1271585.903us    1271609.375us                1    1271585.903us    1271609.375us
stack        1269943.439us    1269953.125us                1    1269943.439us    1269953.125us
stack        1184606.802us    1184597.656us                1    1184606.802us    1184597.656us
stack        1176057.135us    1176062.500us                1    1176057.135us    1176062.500us
stack        1108025.533us    1108031.250us                1    1108025.533us    1108031.250us
stack        1095250.413us    1095257.812us                1    1095250.413us    1095257.812us
stack        1082371.450us    1082375.000us                1    1082371.450us    1082375.000us
stack        1080302.317us    1080312.500us                1    1080302.317us    1080312.500us
stack        1028030.105us    1028039.062us                1    1028030.105us    1028039.062us
stack        1015617.116us    1015625.000us                1    1015617.116us    1015625.000us
stack         861592.872us     861601.562us                1     861592.872us     861601.562us
stack         860586.499us     860593.750us                1     860586.499us     860593.750us