How does pytorch handle the mini-batch training?

Wmog · November 9, 2017, 11:07am

When we pass a mini-batch to an ANN in Pytorch, how is the gradient computed on this mini-batch and how does the network parameters are updated?

Especially my starting point was my previous post where I provided a snippet:
Different outputs for identical sequences in a batch

In this thread, I showed (I guess) that the network has as much as versions of its parameters as the size of the batch (one version per element of the batch dimension). To come to this conclusion I just trained my LSTM and I finally pass it a batch with identical sequences and the output of the network turned out different. It turned out that the outputs are different over the batch dimension of the output although the elements of the mini-bath passed to the network are identical.