Copy model weights between processes

Hello,

I have a few processes running on the CPU that generate data and one process (potentially more in the future) where I consume the produced data to train my model. The issue is that when all the processes are on the CPU (including the training process) the loss drops, but when the model is on the GPU (MPS specifically), the loss does not drop. What am I doing wrong? Below is the logic of my code:

model = MyModel()
model.share_memory() # So that all processes share the same weights
start_processes_collect_data(model)
train_model = copy.deepcopy(model)
train_model = accelerator.prepare(train_model) # I use the hugging face accelerator to set the device
start_training(train_agent)
model.load_state_dict(train_model.state_dict())

Thanks in advance