Optimise for loop in model forward

I am solving a problem using a deep learning model that generates a mask during the forward pass. Can someone help me to optimize these for loops?

            mask = torch.zeros(image.shape[0],1,224,224).cuda()
            for batch in range(image.shape[0]):
                for i in range(224):
                    for k in range(224):
                        arr = torch.tensor([k,i],dtype=torch.float32).cuda() - head_point[batch,:]
                        mask[batch,:,i,k] = torch.dot(arr,xy[batch,:])/(torch.norm(arr,p=2)*torch.norm(xy[batch,:],p=2))

This is the equation