Which libraries are you using for the custom build and which are used in the
Also, why are you comparing the speed of such an old PyTorch version?
the libraries are listed in https://github.com/pytorch/pytorch/issues/46245, the reason of comparing speed is that I want to reproduce the usr’s training speed, his pytorch 1.2 environment is build by pip install , but my only build from source since our internal platform’s limitation.
If you made sure the binary and your local build are equal you could use profiling tools such as NSIGHT or use the built-in profiler in PyTorch.
Also, note that your profiling should synchronize the device before starting and stopping the timer, but I assume you are already familiar with profiling PyTorch ops.
I have already profiling and saving to timeline.json, the most cost time of each train step is IndexPutBackward op (0.8s vs. 0.2s)