Tracing based selective build for cuda kernels

Hey, I’m trying to reduce the pytorch installation for running inference. I have a limited set of models that I’d like to run. I saw that there exists a way to trace models to create a selective build for cpu (PyTorch’s Tracing Based Selective Build | PyTorch). I was wondering if that would also be possible to do something similar to reduce the size of libtorch_cuda_cpp.so and libtorch_cuda_cu.so by only including the kernels that are used by these models?

Good timing! There’s a discussion posted today Enable Link Time Optimization in PyTorch 2.0 Release Binaries - Smaller, Faster, Better Binaries · Issue #93955 · pytorch/pytorch · GitHub that you can support

1 Like