Is there a slimmed down pytorch for computation?

xiaodai · April 28, 2020, 6:12am

We developed a neural network in PyTorch and when we try to deploy we had to include PyTorch in the docker image. Backend team complains that PyTorch is “too heavy”. Is there a fast pkg out there that can carry out the computation of a neural network developed by Pytorch?

smth · April 28, 2020, 6:17am

we provide a cpuonly version of PyTorch without the CUDA functionality, and it is significantly smaller in size (about 5x smaller).

To install it go to https://pytorch.org and in the Install Selector, select CUDA to be None.

Hopefully your backend team is happy with this version.

xiaodai · April 28, 2020, 6:31am

Thank you very much! Do you see a possibility of separating out the functionality to run a NN into a separate tiny package that only runs the models? That would be really awesome for deployment.

Thanks again for your help?

smth · April 28, 2020, 6:32am

we actually provide something like that (a runtime that is ~5MB or lesser), but for Mobile only at the moment, i.e. for iOS / Android: https://pytorch.org/mobile/home/

smth · April 28, 2020, 6:35am

part of the challenge when you run on Desktop / Server is not so much the size of PyTorch, but the size of depedencies that make deployment also efficient.

On x86 CPU, you will want Intel MKL and MKLDNN (which PyTorch install provides) which themselves are ~70MB and ~50+MB respectively. They are used for fast matrix multiplications and neural network operations.

on GPU, you will want CUDA and CuDNN, which take about 2GB

xiaodai · April 28, 2020, 6:47am

Someone on twitter suggested the onnxruntime might be worth looking into

including here for ppl stumbling upon this.

suryaprakaz · April 28, 2020, 6:51am

@smth PyTorch could consider shipping with custom builds of MKL libraries that strip out unwanted features. If that happens, most don’t have to go search for an optimized inference engine

smth · April 28, 2020, 4:45pm

in the pytorch CPU-only wheels, we actually do statically link MKL and MKLDNN so that the unneeded symbols are stripped out, hence their respective sizes.

The reality is, whether you use ONNX Runtime or PyTorch wheel, if you want MKL and MKLDNN, you will pay the cost of the size of the routines they ship, and their BLAS routines and their Convolution routines are large (they ship a lot of precompiled codegen).

The PyTorch CPU-only wheel today ships at a size of 127.3MB for PyTorch 1.5.0

suryaprakaz · April 28, 2020, 5:04pm

Transitioning from MKLDNN to a Glow CPU backend would be great!

smth · April 28, 2020, 5:39pm

then we have to include precompiled Glow code?