shuffle=False in Validation Loader decrease performance

Hi everyone, I am developing a model to perform semantic segmentation. Basically as the title said, I obtain different performances if I change the shuffle parameter of the validation dataloader. Specifically, with shuffle=True the mIoU computed on the validation data is higher than the mIoU computed on the same dataloader but with shuffle=False.

In the following plot you can see the differences:

Here, the orange line represents the mIoU and loss computed using dataloader with shuffle=True; viceversa, the blue line represents the same setting, but with shuffle=False.

As you can see there is a big difference between the 2 mIoU plots, even if the losses are more or less the same. Here some relevant line of codes of my train and evaluation loop:

binary_iou = torchmetrics.classification.BinaryJaccardIndex(ignore_index=IGNORE_INDEX_MIOU).to(device)

for epoch in range(epochs):
    train_miou = 0
    total_loss = 0

    # Train 
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target =,
        data, target = random_augmentations(data, target)

        outputs = sl_model(data)
        loss = criterion(outputs, target)
        total_loss += loss.item()
        train_miou += binary_iou(torch.argmax(outputs, dim=1), target)
    # Evaluation
    test_miou = 0
    test_loss = 0
    for batch_idx, (data, target) in enumerate(val_loader):
        data, target =,
        with torch.no_grad():
            outputs = sl_model(data)
        test_miou += binary_iou(torch.argmax(outputs, dim=1), target)
        loss = criterion(outputs, target)
        test_loss += loss.item()

I really hope someone can help me, thanks in advance.

Could you describe how the binary_iou is implemented and if its result depends on the actually used samples in the batch? If so, your results would be expected.

1 Like

@ptrblck yes, I found out that the BinaryJaccardIndex does not compute an image-wise IoU but it flattens the entire batch and compute the IoU considering the entire batch as a single image, resulting in different results based on the order of data in the batches.

1 Like