The gradients comoputed by backward()

Hi, guys. I want to know if the gradients computed by the backward is the average of the whole batch ?

Assuming you call it as output.backward(), where output is a scalar, the gradients is just d output / d input. Whether it is an average of the batch depends on how you calculate output.