Multiprocessing inside forward on single GPU

Hi, I want to run two lines in parallel inside forward function on single GPU.
The two sub-processes are independent from each other. I want to run self.l1 and self.l2 simultaneously here. The minimum code is as follows:

import torch
import torch.nn as nn
import torch.multiprocessing as mp
import time

mp = mp.get_context('spawn')

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(784, 100)
        self.l2 = nn.Linear(784, 100)

    def forward(self, x):

        processes = []
        num_processes = 2

        for i in range(num_processes):
            if i == 0:
                p = mp.Process(target=self.l1(x))
            else:
                p = mp.Process(target=self.l2(x))
            p.start()
            processes.append(p)
        for p in processes:
            p.join()

       

  
if __name__ == '__main__':
    BATCH_SIZE = 64
    x = torch.randn(BATCH_SIZE, 784).cuda()
    mynet = Net().cuda() # ; torch.cuda.synchronize()
    mynet(x)

I found similar question: for loop inside single GPU and for loop inside forward.
There are two relevant questions : parallel over samples and parallel execution, the answers to these two questions are reformulated version of the original problem.
Besides, I tried the nn.Parallel(block1, block2) inside this post two blocks in parallel however it did not work.
I know distribute model to multiple GPU is well documented, however I have only one GPU here.
multiprocessing over model level is documented here multiprocessing over model level

Any help would be appreciated, thanks a lot!

Hello, Did u solve the problem