Hi!
I hope I’m in the right place to ask this question. I’m new to actually caring about how autograd works, so I’m trying to understand how I can define a new autograd function in the case where I map a matrix to a scalar using intermediate matrix transformations. I’m mainly wondering what mathematically is going on. I’m looking at PyTorch: Defining New autograd Functions — PyTorch Tutorials 1.11.0+cu102 documentation.
I realize I can using autograd calculate the gradient of something like energy
below,
def mat2mat_function(matrix):
return(matrix@ matrix)
def energy(matrix):
return(torch.trace(matrix_sq.T @ matrix_sq))
My intermediate function mat2mat_function
in the above example I input a matrix and am returned is square. It is not clear to me what is going on auto-grad wise. What is meant by applying the chain rule to a matrix-to-matrix function? In this case, I can of course guess that I mean that I multiply the current gradient with 2X, as the function is X^2. But what if my matrix-to-matrix function is much more complicated? I looked into Gateux derivatives, but they give operators between the spaces and not directly gradients. So what is going on, and how would I implement my own custom matrix-to-matrix autograd function?
I can, following the example, define a class
class matrix_function(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
return input @ input
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
return grad_output * 2*input #correct ???????
If the Gateaux derivative of my intermediate matrix function was some much more complicated thing, which only when evalualted would return a matrix, what do I do then? Imagine that I’m for instance solving an equation A(X) = Y
, and am inputting X
to obtain Y
?