Hi folks,

There is a problem that has bothered me for quite a long time. Assume we are minimizing a loss function parameterized by , on samples using SGD, where M is the mini-batch size. Since the PyTorch autograd can only be implicitly created for **scalar** outputs, I am wondering if there is any efficient way to compute the gradient for **each sample**, i.e., , without setting the batch size equals to 1 and compute in a for loop (which is too slow)?

Thank you for your help!