Why doesn't Pytorch favour batch dimension first?

Is there a reason why Pytorch doesn’t favour batch size first when organizing the data? Is there a reason for it to not be like that?

Is that the opposite of tensorflow or something? Perhaps there is a good reason idk…


In pytorch, we do have batch first scenarios. For example if you process any datasets from torchvision. They are processed batch_first. You can change it. I added one small example below.

testset = torchvision.datasets.CIFAR10(root=data_path, train=False, download=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

If you process the above testloader, you will get something as follows.

for i, data in enumerate(testloader, 0):
    inputs, labels = data  # inputs and labels are tensors with size 4x3x32x32 and 4

Here the batch is the first dimension, then the channels. At the end, we have height and width.

One more thing batch first and batch last are two different settings. The desired representation can be achieved using torch.transpose.