So I have a network implementation and two machines of same hardware.
precisely
---------------------- --------------------------------------------------------------------------------
sys.platform linux
Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
numpy 1.19.2
detectron2 0.3 @/home/roboeye-2/repos/roboeye/nova/src/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 11.1
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.7.1 @/home/roboeye-2/.virtualenvs/nova/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0 GeForce RTX 2080 Ti (arch=7.5)
CUDA_HOME /usr/local/cuda
Pillow 7.2.0
torchvision 0.8.2 @xxxx/lib/python3.6/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.3.post20210306
cv2 4.4.0
---------------------- --------------------------------------------------------------------------------
on one machine forward operation takes only 0.5 seconds, 40% of GPU utils
however, on the other machine, the same operation takes 22 seconds, 100% of GPU utils
Since all the hardware setting is same, code is identical, I am quite lost on what is causing the issue.
Can anyone help me with what I can try?