By default, drop_last is set to false, so the final batch is retained. I understand that situations with complicated models or small datasets require all the available data to be utilised, and an incomplete final batch can still contribute valuable information.
However, many datasets contain anomalous samples and potentially incorrect labels. The gradients for these anomalous points are likely to differ greatly from the dataset average but usually would make a minimal difference when included in a large batch. An update on a final mini-batch that is smaller than the others and contains too many anomalous points can drastically differ from the global optimum and often increases the loss for the next update causing noisy spikes in the loss over time.
Standard practice is to train over many epochs, shuffling the mini-batches each time. The model will see all the points many times, even if the final mini-batch is discarded in each epoch. Therefore, I believe that the risks of keeping the final minibatch far outweigh any benefit and I can not understand why drop_last = false by default in PyTorch with no warning.