I don’t think gradient clipping would be time-consuming. I guess it’s simply an operation on tensor which should be fast.