Pytorch multiprocessing

Hi! I am trying to use pytorch to solve an optimization problem with gradient descent. Basically, I have a model with a parameter v and over each of my 7 experiments, the model sequentially runs a forward process and calls the calculate_labeling function with v as the input. The output of that forward process is aggregated and then sent to the loss function which is then backpropagated to update v. I want to parallelize running the forward process on each of the experiments with pytorch.multiprocessing and I have access to a multicore cpu.

This is the general code i have been using but i ran into this error. Does anyone have any tips on how I could either speed up this process in a different way or achieve this parallelization? The backwards calls on each of these forward calls really take up the bulk of the time so i was hoping to parallelize those specifically.

RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).

def forward(self):
        output_queue = mp.Queue()
        
        # Container for processes
        processes = []

        # Loop over experiments and spawn a process for each calculation
        for the_experiment in self.experiments:
            p = mp.Process(target=self.worker, args=(the_experiment, v, self.device, output_queue))
            p.start()
            processes.append(p)

        # Collect results
        labeling_results = [output_queue.get() for _ in self.experiments]

        # Make sure all processes have finished
        for p in processes:
            p.join()

        return torch.stack(labeling_results).squeeze(0)

    @staticmethod
    def worker(the_experiment, v, device, output_queue):
        the_labeling = calculate_labeling(v, device)
        
        output_queue.put(the_labeling)