The arguments of .backward()

Yifan_Xu · July 26, 2019, 3:53am

I kind of understand that when there is no argument passed in .backward(), it means the gradient that we want to compute is a scaler tensor, otherwise there would be an error. However, today I read a code that works with no argument in .backward() for a vector tensor. Does anyone knows why this works?

I paste a snippet here

loss = F.binary_cross_entropy_with_logits(xbhat, xb, size_average=False) / B
loss.backward()

where xbhat is predicted values and xb are original mnist data

Mazhar_Shaikh · July 27, 2019, 11:36am

Hi Yifan_Xu,
Backward for a vector tensor certainly does not work without the input gradient.
One reason the above code may be working is that the size_average argument has been deprecated in the current version of pytorch, in favour of the reduction argument, which has a default value of ‘mean’. Hence, the above code may unintentionally calculated gradient on a scalar tensor.
Hope this helps.

Yifan_Xu · July 29, 2019, 7:08pm

Hey Mazhar_Shaikh,

Thanks for your replay. I finally understand what was going on under the hood. I didn’t make it clear that this code was from a neural net. That means although xb is passed as a vector, it does not calculated as a vector. Say we have xb = [x1, x2, x3] and a simple MLP that outputs xbhat = g(w0b+w1x1 + w2x2 + w3x3). As you easily tell, x1, x2, x3 is done separately not as a whole. That’s why using .backward() without argument is legal here.