The gradients comoputed by backward()


(吴秉哲) #1

Hi, guys. I want to know if the gradients computed by the backward is the average of the whole batch ?


(Simon Wang) #2

Assuming you call it as output.backward(), where output is a scalar, the gradients is just d output / d input. Whether it is an average of the batch depends on how you calculate output.