Dataloader length==1 instead of number of images

Hi there,
I am trying to reproduce the results of the following repository:

While training with train_mtl.py file , I noticed that the data loader has length equal to 1, while it should have equal length to the amount of images inside the dataset folder.

Can you help me solve this issue? The contributor used PyTorch 0.3.0 but I am using 0.4.1 because I don’t have Linux and that version is not available fo windows anymore.

That’s not the case. The length of the DataLoader is defined by the number of batches it can generate from the Dataset. In the default setup it should be ceil(len(dataset)/batch_size). If the batch size is equal to the length of your dataset, only a single batch will be returned.

Thank you for your reply.

this is the part of the code that created the train_loader variable:

train_loader = data.DataLoader(
    __dataset__[args.dataset](
        config["train_dataset"],
        seed=config["seed"],
        is_train=True,
        multi_scale_pred=args.multi_scale_pred,
    ),
    batch_size=config["train_batch_size"],
    num_workers=8,
    shuffle=True,
    pin_memory=False,
)

As you can see batch size is set by a parameter inside a json file. The number of images is 4000 (more or less) and the number of batch is 16.
Here is the part of the json file that provides this kind of info to the code:

"seed": 7,
    "task1_classes": 2,
    "task2_classes": 37,
    "task1_weight": 1,
    "task2_weight": 1,
    "train_batch_size": 16,
    "val_batch_size": 4,
    "refinement":3,
    "train_dataset": {
        # we are not interested in the spacenet dataset
        "spacenet":{
            "dir": "/data/spacenet/train_crops/",
            "file": "/data/spacenet/train_crops.txt",
            "image_suffix": ".png",
            "gt_suffix": ".png",
            "crop_size": 256
        },
        "deepglobe":{
            "dir": "C:/Users/.../road_connectivity-master/data/deepglobe/train_crops/",
            "file": "C:/Users/.../road_connectivity-master/data/deepglobe/train_crops.txt",
            "image_suffix": ".jpg",
            "gt_suffix": ".png",
            "crop_size": 256
        },
        "crop_size": 256,
        "augmentation": true,
        "mean" : "[70.95016901, 71.16398124, 71.30953645]",
        "std" : "[ 34.00087859, 35.18201658, 36.40463264]",
        "normalize_type": "Mean",
        "thresh": 0.76,
        "angle_theta": 10,
        "angle_bin": 10
    },

I am not an experienced developer, so please excuse me if I am missing something obvious here.

The train function that uses the train_loader is this:

def train(epoch):
    ...
    for i, data in enumerate(train_loader, 0):
        inputsBGR, labels, vecmap_angles = data
        inputsBGR = Variable(inputsBGR.float().cuda())
        outputs, pred_vecmaps = model(inputsBGR)
        ...

        viz_util.progress_bar(
            i,
            len(train_loader),
            "Loss: %.6f | VecLoss: %.6f | road miou: %.4f%%(%.4f%%) | angle miou: %.4f%% "
            % (
                train_loss_iou / (i + 1),
                train_loss_vec / (i + 1),
                miou,
                road_iou,
                miou_angle,
            ),
        )

When I run this part of code to train the algorithm, I get that the length of the train loader is equal to 1, and therefore, it trains only on one image per epoch. Can you help me overcome this issue?

Could you try to narrow down your code to just the data loading, as you shouldn’t get a single batch out of your DataLoader using the specified arguments.

Have a look at this code snippet:

N = 4000
batch_size = 16
dataset = TensorDataset(
    torch.randn(N, 1)
)
loader = DataLoader(
    dataset,
    batch_size=batch_size
)

print('len(dataset) ', len(dataset))
print('len(loader) ', len(loader))
print('expected number of batches {}'.format(
    math.ceil(N / batch_size)))

counter = 0
for data in loader:
    counter += 1

print('loader yields {} batches'.format(counter))

and try to add the print statements into your code. It would be interesting to see the lengths etc.

@ptrblck please check again the previous comment. I have deleted parts of the code that you probably don’t need.

The code looks OK (not sure if the progress_bar is relevant, but I just ignore it), so you would need to add the print statements to your code after creating the Dataset and DataLoader.

In particular, add these lines:

print('len(dataset) ', len(dataset))
print('len(loader) ', len(loader))
print('expected number of batches {}'.format(
    math.ceil(N / batch_size)))

before executing the training loop and post these numbers here.

The progress bar is where it is shown that the train_loader length is equal to 1, while it should be equal to a bigger number, corresponding to the number of images of the batch.

The result of the previous code snipet is the following:

len(dataset)  1
len(loader)  1
expected number of batches 2642

I set N=42264 manually --> the number of images withing my training folder.
batch_size=16, set according to the config file

Apparently your Dataset only contains a single sample.
Could you check the image folders and make sure they contain the expected number of files (also make sure they are not in a subfolder etc.).

dataset
As you can see in the above image, the path that I am using is the correct one (it matches the config file).

Solved it. There was a file responsible for the Dataset creation and because I am running on Windows while this code was implemented in Linux, I had to modify some parts of the code that created newline characters and were relevant with the path. Thank you so much for your patience and I am sorry that you spent your time to debug something that didn’t have to do with PyTorch. I wish you a happy remainder of the day.

No worries and I’m glad you’re found the issue. :slight_smile:

1 Like