How to use DataParallel in backward?

oh, thanks!! Please forgive me. In fact it is my first time to post a topic in a coding forum, and I won’t make a double post again.
I print loss and find it is a scalar. I’m curious about how the Dataparallel works in backward.
Your doc say that
Data Parallelism is when we split the mini-batch of samples into multiple smaller mini-batches and run the computation for each of the smaller mini-batches in parallel.
So, I think the loss should be a tensor with shape (8*1). Each smaller mini-batches corresponds to a loss. Why there is only one scalar?