Hi, guys. I want to know if the gradients computed by the backward is the average of the whole batch ?
Assuming you call it as
output is a scalar, the gradients is just d output / d input. Whether it is an average of the batch depends on how you calculate