I add my profiling results from `torch.utils.bottleneck`

. I am not sure what can I read from the results, in particular if CUDA time is bigger than CPU time - does it mean GPU is utilised? Thanks!

```
--------------------------------------------------------------------------------
autograd profiler output (CPU mode)
--------------------------------------------------------------------------------
top 15 events sorted by cpu_time_total
--------- --------------- --------------- --------------- --------------- ---------------
Name CPU time CUDA time Calls CPU total CUDA total
--------- --------------- --------------- --------------- --------------- ---------------
stack 1995016.802us 0.000us 1 1995016.802us 0.000us
stack 1433562.687us 0.000us 1 1433562.687us 0.000us
stack 1418816.239us 0.000us 1 1418816.239us 0.000us
stack 1208400.125us 0.000us 1 1208400.125us 0.000us
stack 1109156.949us 0.000us 1 1109156.949us 0.000us
stack 1043755.894us 0.000us 1 1043755.894us 0.000us
stack 989006.451us 0.000us 1 989006.451us 0.000us
stack 988511.989us 0.000us 1 988511.989us 0.000us
stack 984434.292us 0.000us 1 984434.292us 0.000us
stack 980338.307us 0.000us 1 980338.307us 0.000us
stack 976940.691us 0.000us 1 976940.691us 0.000us
stack 955838.942us 0.000us 1 955838.942us 0.000us
stack 955763.458us 0.000us 1 955763.458us 0.000us
stack 952211.930us 0.000us 1 952211.930us 0.000us
stack 951751.424us 0.000us 1 951751.424us 0.000us
--------------------------------------------------------------------------------
autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
top 15 events sorted by cpu_time_total
Because the autograd profiler uses the CUDA event API,
the CUDA time column reports approximately max(cuda_time, cpu_time).
Please ignore this output if your code does not use CUDA.
--------- --------------- --------------- --------------- --------------- ---------------
Name CPU time CUDA time Calls CPU total CUDA total
--------- --------------- --------------- --------------- --------------- ---------------
stack 1348676.702us 1348687.500us 1 1348676.702us 1348687.500us
stack 1325784.279us 1325796.875us 1 1325784.279us 1325796.875us
stack 1301842.419us 1301843.750us 1 1301842.419us 1301843.750us
stack 1271585.903us 1271609.375us 1 1271585.903us 1271609.375us
stack 1269943.439us 1269953.125us 1 1269943.439us 1269953.125us
stack 1184606.802us 1184597.656us 1 1184606.802us 1184597.656us
stack 1176057.135us 1176062.500us 1 1176057.135us 1176062.500us
stack 1108025.533us 1108031.250us 1 1108025.533us 1108031.250us
stack 1095250.413us 1095257.812us 1 1095250.413us 1095257.812us
stack 1082371.450us 1082375.000us 1 1082371.450us 1082375.000us
stack 1080302.317us 1080312.500us 1 1080302.317us 1080312.500us
stack 1028030.105us 1028039.062us 1 1028030.105us 1028039.062us
stack 1015617.116us 1015625.000us 1 1015617.116us 1015625.000us
stack 861592.872us 861601.562us 1 861592.872us 861601.562us
stack 860586.499us 860593.750us 1 860586.499us 860593.750us
```