More efficient face alignment using affine/similarity transform in pytorch

I have been searching for a solution to do this more efficiently entirely with torch tensors but have not found one so I am posting here to see if some expertise could help.
I am running inference on a facial detection model which then needs an alignment to then be an input for recognition.
The flow goes from: cv2 image → torch → detection model ->torch landmarks → numpy(cpu) ->estimate transform (cv2 or skimage) (cpu) → execute transform on cv2 image (cpu) → torch for next model

All torch tensors are actions are done on GPU. Now I would like to avoid doing this back and forth between numpy and torch.

In order to do this I would need to get torch equivalent functions to the two cv2 functions below:

        tfm, _ = cv2.estimateAffinePartial2D(src_pts.cpu().numpy(), ref_pts)
        face_img = cv2.warpAffine(image, tfm, out_size)

Are there any such functions? I have found ways to convert the transform matrix to F.affine_grid which would still require to generate the transformation matrix in numpy:

but would like to avoid this altogether by generating the similarity transformation directly from torch tensors and apply it to a tensor.

Follow my script face_alignment_torch.py · GitHub