How the gradient in Linear layer be computed using transpose?

liwm29 · March 8, 2020, 1:38pm

I’m wondering why i need to transpose the grad_output when calculate grad_weight?

the linear layer is to apply a linear transformation to the incoming data: y = xA^T + b
What i’m wondering is that how the gradient of weight matrix be computed by code like the following:
If i define a backward func,i should write like this:

def backward(ctx , grad_output)
  imput,weight,bias = ctx.saved_tensors

  grad_input = grad_output.mm(weight)
  grad_weight = grad_output.t().mm(input)
  grad_bias = grad_output.sum(0).squeeze(0)
  return grad_input,grad_weight,grad_bias

weight ,bias and input are cached by the forward action and use ctx.saved_tensors to reuse

I know it do make it right because only we transpose it can we have the correct shape to have a matrix multiply(torch.mm)
But is there any mathematical reason for the code of transpose?

THANKS!

albanD · March 8, 2020, 9:29pm

Hi,

If you write down the values for each entry in the matrix, you will see that the indices a flipped. Which means that in code, you should add a Transpose.

liwm29 · March 9, 2020, 6:45am

Thanks a lot , I will have a try