Non scalar backward and self mini batch implementation

My question relates this one.

#Quote Mr SimonW
All autograd does is just to calculate the gradient, it has no notion of batching, and I don’t see how it can have different behavior with different batching mechanism.