Hi,
I am running in a DataDistributedParallel environment, where each worker imports a JIT module and therefore calls torch.utils.cpp_extension.load(... )
. As a result, each worker compiles the JIT module from scratch. This take ages. Is there a way to efficiently cache this load
ing?