Recently I’ve tried to install Pytorch from source with CUDA11.7.0 and perform a Bert training.
I found that with torch-v1.11.0, the performance could attain 450 examples/sec, which outspeeds the one trained with torch-v1.12.0, 150 examples/sec.
Below is installation command:
CFLAGS="-g0 -fno-gnu-unique" USE_CUPTI_SO=1 USE_KINETO=1 CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" MAX_JOBS=80 USE_SYSTEM_NCCL=1 CUDA_HOME=/usr/local/cuda python setup.py install
Is there anyone who could help me solving this issue?