Clip_grad_norm takes very long time in training

I’ve used clip_grad_norm in my training process. I used snakeviz package to analyse my code efficiency, but find this clip process took an enormous time (total 1.6h one iteration and clip_grad_norm took 20min).
I have another reference code which also has clip process, which takes little time.
In snakeviz analysis, The clip in my code called the <method ‘norm’ of ‘torch._C.TensorBase’ objects> while that code called the<method ‘norm’ of ‘torch._C.CudaFloatTensorBase’ objects>. I guess this is the problem.
I used torch 0.4.0 with python2 , that code used torch 0.3.0 with python3. What should I do?

1 Like