I am solving a problem using a deep learning model that generates a mask during the forward pass. Can someone help me to optimize these for loops?

```
mask = torch.zeros(image.shape[0],1,224,224).cuda()
for batch in range(image.shape[0]):
for i in range(224):
for k in range(224):
arr = torch.tensor([k,i],dtype=torch.float32).cuda() - head_point[batch,:]
mask[batch,:,i,k] = torch.dot(arr,xy[batch,:])/(torch.norm(arr,p=2)*torch.norm(xy[batch,:],p=2))
```

This is the equation