Hello- I’m working through the CIFAR-10 tutorial and I am wondering how long it usually takes to run the training. In CPU mode, it is consistently taking my machine 1h10m to complete the two epochs in the example. Are other people seeing similar times?
1h10m, oh my that’s a lot. It takes me about a minute.
What’s your OS and version?
How did you install pytorch?
Are you running this in a virtual machine, or on bare metal?
I hope that with this information, I can reproduce it.
Thanks for the speedy reply! Here’s my info:
OS- Ubuntu 14.04.1
Python- 3.6.1
Install- Pulled the repo and built it using "python setup.py install"
VM- No, but I am doing this through Xfce from a Windows box, if that matters.
Your install method is the reason why it’s so slow.
You have two options:
- Install the binary from pytorch.org (we provide wheel files and conda binaries)
- Install a proper BLAS library on your machine (MKL or OpenBLAS, MKL is available via Conda) and add it to the environment variable
CMAKE_PREFIX_PATH
. Otherwise pytorch will use super-unoptimized BLAS, and that’s the reason for the slowness. You can read one way of doing this here: https://github.com/pytorch/pytorch#from-source
I tried both options, and neither seems to make a difference However, I found an issue with torchvision when installing via conda. Where should I report that?
another reason for your slowdown might be a large number of CPU cores.
Try setting these environment variables:
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
See if that fixes it?
That did it for me. The training just completed in less then 2 minutes. Thanks for all the help!
why did that fix it? Just curious.
What effect does setting those two environment variables have on the speed of training?
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4