I want to deploy a standalone executable, for a variety of Linux execution environments, and use CUDA-enabled libtorch for it. To facilitate this, I want to package everything together in as small a binary as possible. One issue, aside from CUDA libraries, is libtorch_cuda.so. When I use nvprune
to try to make it smaller by removing some older architectures, it complains that the .so file is not relocatable, so it fails. Are there any easy ways to get rid of certain archs for libtorch_cuda.so? Strip kernels that I don’t use in practice? Statically link to my own pruned versions CUDA libraries like libcublas?
Any thoughts on this, or other tricks or ideas for reducing size?