Batch size vs effectiveness of training

I noticed that when training a network, the larger the batch of dataset, the less effectively the weights are trained, but the training itself happens faster.

Where can I read about the details of how batch training works in PyTorch?