Hi!

I hope I’m in the right place to ask this question. I’m new to actually caring about how autograd works, so I’m trying to understand how I can define a new autograd function in the case where I map a matrix to a scalar using intermediate matrix transformations. I’m mainly wondering what mathematically is going on. I’m looking at PyTorch: Defining New autograd Functions — PyTorch Tutorials 1.11.0+cu102 documentation.

I realize I can using autograd calculate the gradient of something like `energy`

below,

```
def mat2mat_function(matrix):
return(matrix@ matrix)
def energy(matrix):
return(torch.trace(matrix_sq.T @ matrix_sq))
```

My intermediate function ` mat2mat_function`

in the above example I input a matrix and am returned is square. It is not clear to me what is going on auto-grad wise. What is meant by applying the chain rule to a matrix-to-matrix function? In this case, I can of course guess that I mean that I multiply the current gradient with 2X, as the function is X^2. But what if my matrix-to-matrix function is much more complicated? I looked into Gateux derivatives, but they give operators between the spaces and not directly gradients. So what is going on, and how would I implement my own custom matrix-to-matrix autograd function?

I can, following the example, define a class

```
class matrix_function(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
return input @ input
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
return grad_output * 2*input #correct ???????
```

If the Gateaux derivative of my intermediate matrix function was some much more complicated thing, which only when evalualted would return a matrix, what do I do then? Imagine that I’m for instance solving an equation `A(X) = Y`

, and am inputting `X`

to obtain `Y`

?