Hello,

I am trying to use gradient descent to learn the affine matrix values necessary to minimize a cost function.

In short, I have n pairs of images, and a tensor of size [n, 3, 3], one identity matrix for each image. For each pair of images, I use the corresponding tensors to create an affine transformation of these images. I then calculate a full-normalized covariance matrix of the two images. Importantly, I cannot use pytorch for this step. The cost is the difference between the location of the largest values in the ncov matrix and the center of the ncov array, which would be a perfect overlay. This difference (loss) is returned as a single scalar value.

Because I canâ€™t use pytorch to calculate the ncov, I break the computational graph and cannot calculate the gradients. I gather that I can make a subclass of `torch.autograd.Function`

(described here), however, it is very unclear to me specifically how to write the `backward()`

method to ensure that the new `Function`

works properly with the autograd engine.

It seems like the values that should be returned from the `backward()`

method depend heavily on which non-torch computations are used. Am I overthinking this? If not, is there a good reference simplifying how forward/backward should be structured for unconventional loss functions like this?