Share pytorch model between inference and train process, simple

I would like to share a pytorch model between two processes. I have a trainer and an application (inference) module as separate processes running. I have implemented model.save/load only when the trainer is writing and the inferences module tries to read, it gives an error (obviously).

I am looking at this forum and everywhere and it drives me crazy with complex solutions for training with multiple processes etc which is not what I want. I am really wondering how hard it can be to simple share the state dict between two processes.

One attempt was using SharedMemoryManager, here is an example of what I wrote:

    with SharedMemoryManager() as shared_memory_manager:

        # Initialize shared memory
        shared_memory = dict()
        for key in shared_memory_settings.keys():
            shape = shared_memory_settings[key]["shape"]
            dtype = shared_memory_settings[key]["dtype"]
            shared_memory_settings[key]["shm"] = shared_memory_manager.SharedMemory(size=np.zeros(**shared_memory_settings[key]).nbytes)
        shared_memory_settings["state_dict"] = shared_memory_manager.SharedMemory(size=sys.getsizeof(model.model.state_dict()))

        # Initialize and start applications
        p1 = Process(target=Trainer, args=(shared_memory_settings, deepcopy(model), model_weights_path))
        p2 = Process(target=Application, args=(dataset, model, model_weights_path, shared_memory_settings))
        p1.start()
        p2.start()
        p1.join()
        p2.join()

I am trying to use SharedMemory which works fine with numpy arrays but it gives me a difficult time for sharing the state dict.

So my question is, what is the best practise for a very simple problem which is using model weights for inference while a seperate independent process is training and updating these weights on a regular basis.

I have looked into queues but on a first glance it looks very complicated and overcomplicated things, but perhaps that is the way I should go?

1 Like