Cpp_extension load for distributed environment

Hi,
I am running in a DataDistributedParallel environment, where each worker imports a JIT module and therefore calls torch.utils.cpp_extension.load(... ). As a result, each worker compiles the JIT module from scratch. This take ages. Is there a way to efficiently cache this loading?