How the gradient in Linear layer be computed using transpose?

I’m wondering why i need to transpose the grad_output when calculate grad_weight?

  • the linear layer is to apply a linear transformation to the incoming data: y = xA^T + b
  • What i’m wondering is that how the gradient of weight matrix be computed by code like the following:
    If i define a backward func,i should write like this:
def backward(ctx , grad_output)
  imput,weight,bias = ctx.saved_tensors

  grad_input = grad_output.mm(weight)
  grad_weight = grad_output.t().mm(input)
  grad_bias = grad_output.sum(0).squeeze(0)
  return grad_input,grad_weight,grad_bias

weight ,bias and input are cached by the forward action and use ctx.saved_tensors to reuse

  • I know it do make it right because only we transpose it can we have the correct shape to have a matrix multiply(torch.mm)
  • But is there any mathematical reason for the code of transpose?

THANKS!

Hi,

If you write down the values for each entry in the matrix, you will see that the indices a flipped. Which means that in code, you should add a Transpose.

Thanks a lot , I will have a try