Is there a slimmed down pytorch for computation?

We developed a neural network in PyTorch and when we try to deploy we had to include PyTorch in the docker image. Backend team complains that PyTorch is “too heavy”. Is there a fast pkg out there that can carry out the computation of a neural network developed by Pytorch?

we provide a cpuonly version of PyTorch without the CUDA functionality, and it is significantly smaller in size (about 5x smaller).

To install it go to https://pytorch.org and in the Install Selector, select CUDA to be None.

Hopefully your backend team is happy with this version.

1 Like

Thank you very much! Do you see a possibility of separating out the functionality to run a NN into a separate tiny package that only runs the models? That would be really awesome for deployment.

Thanks again for your help?

we actually provide something like that (a runtime that is ~5MB or lesser), but for Mobile only at the moment, i.e. for iOS / Android: https://pytorch.org/mobile/home/

1 Like

part of the challenge when you run on Desktop / Server is not so much the size of PyTorch, but the size of depedencies that make deployment also efficient.

On x86 CPU, you will want Intel MKL and MKLDNN (which PyTorch install provides) which themselves are ~70MB and ~50+MB respectively. They are used for fast matrix multiplications and neural network operations.

on GPU, you will want CUDA and CuDNN, which take about 2GB

Someone on twitter suggested the onnxruntime might be worth looking into

including here for ppl stumbling upon this.

@smth PyTorch could consider shipping with custom builds of MKL libraries that strip out unwanted features. If that happens, most don’t have to go search for an optimized inference engine

in the pytorch CPU-only wheels, we actually do statically link MKL and MKLDNN so that the unneeded symbols are stripped out, hence their respective sizes.

The reality is, whether you use ONNX Runtime or PyTorch wheel, if you want MKL and MKLDNN, you will pay the cost of the size of the routines they ship, and their BLAS routines and their Convolution routines are large (they ship a lot of precompiled codegen).

The PyTorch CPU-only wheel today ships at a size of 127.3MB for PyTorch 1.5.0

Transitioning from MKLDNN to a Glow CPU backend would be great!

then we have to include precompiled Glow code?