Tensor.to_cuda() is slow when use our own .whl file built from source code

We have several offline machines and want to deploy pytorch on them. To do this, We need to build a wheel package containing all dependencies (just like the official wheel package) in one machine and install the built .whl file in others. The result is:

(1) A .whl file is built in one machine and successfully installed on another one
(2) In the second machine, It takes a long time (about 3 minutes) in the procedure “lambda t, t.to_cuda()”. It’s a process to transform the parameters of module to cuda tensors. We guess the reason is the cuda dependencies isn’t correctly contained or used.

Our building script similar to https://github.com/pytorch/builder/blob/master/manywheel/build.sh. We can’t directly use it since our system is ubuntu and the machines are all offline. Our own script has the same building precedure: Set the enviroment variables just like that in build.sh (line 5 to 11) —> use “python setup.py bdist_wheel” to build -----> copy the dependencies into the wheel file (build_common.sh line 77 to 190. The build_common.sh is called in the end of build.sh). Our python version is 3.6.2 and cuda version is 9.0.

We hope someone to help us, or just share your experience in building a “manywheel” file. Thank you!

The problem came from the mismatch of CUDA version, just like ptrblck mentioned. We made a mistake when we wrote our own building script. Additionaly, we have confirmed that the script https://github.com/pytorch/builder/blob/master/manywheel/build.sh can be used to build a wheel containing dependencies.

What GPUs are you using?
Is only the first CUDA call taking a long time?
In the past this was due to a mismatch of your CUDA version which resulted in PyTorch being recompiled for your GPU. Could you try it with CUDA8 and see, if it still takes that long?

(1) The GPUs are all nvidia P100, with driver 390.46.
(2) Problem only occurs at the initialization part, model = model.to_cuda()。The time taken for each training and evaluating iteration is normal。
(3) It seems that the latest version of pytorch (v0.4.1) requires cuda 9.0 or 9.2. I’m not sure if we can successfully compile it with CUDA8. Maybe we can try it.

By the way, if you have experience in building a pytorch wheel containing all dependencies, just like the official whl, could you please share it to us? Or tell us where to find the solutions. It will be very helpful to us.

You could have a look at these scripts and see if you can adapt it to your platform etc.

Thank you @ptrblck. You are right. The problem came from the CUDA version mismatch. We check our script again and find that we didn’t set the enviroment variable $TORCH_CUDA_ARCH_LIST correctly. This cause the mismatch of CUDA version. This problem is solved now.

1 Like