Hi, guys. I want to know if the gradients computed by the backward is the average of the whole batch ?
1111 (吴秉哲) #1
SimonW (Simon Wang) #2
Assuming you call it as
output is a scalar, the gradients is just d output / d input. Whether it is an average of the batch depends on how you calculate