Is there a method to parallel the following pytorch code

I have the following pytorch code that define a new matrix operation on two matrix points and primitives

def operations(points, primitives):
    """
    points shape: (batch size, number_of_points, 3)
    primitives shape: (batch_size, number_of_primitives,7)
    """
    gradient = torch.zeros(batch_size,number_of_points,number_of_primitives)
    for i in range(batch_size):
        
        temp_points = points[i,:,:]
        
        temp_primitives= primitives[i,:,:]
        temp = torch.zeros(number_of_points,number_of_primitives)
        for k in range(number_of_points):
            for j in range(number_of_primitives):
                temp[k,j] = torch.norm(temp_points[k,:]*temp_primitives[j,:3]+temp_primitives[j,3:6])
        gradient[i,:,:] = temp
    return gradient

Is there any method to parallelize such code to speed up?Thanks!

The above serialized code is implemented by myself, which is utilized in a deep-learning work. Is there any method to parallelize it? Everytime I run my code, the data loader of pytorch will throw an error RuntimeError: DataLoader worker (pid 255034) is killed by signal: Killed. even though I set the number of worker is zero. Thanks!