My question is that there is a way to apply .backward() when the output dimension is larger than one and each dimension corresponds to each input data in a batch data input.
You can feed a tensor backward as in x.backward(weight)’ this is mathematically equivalent to (x * weight).sum().backward(). There isn’t anything that let’s you get arbitrary derivatives of vectors (i.e. Jacobians).