A problem about the length of DataLoader

Hi all, I ecountered a curious problem. A pytorch code looks like

  1. for batch_idx, (imgs, pids, _) in enumerate(trainloader):
    print(batch_idx, len(trainloader))

The problem is that the batch_idx variable cannot reach len(trainloader). Namely, it cannot go through all the data.

Could some tell me how it occurs?

Note that, I test a new dataset, which is normal. However, I check the dataset related code, I cannot find what causes this problem.

What is the difference between the last batch_idx and len(trainloader)?
Note that the last index would be (len - 1), as Python uses 0-based indexing.

QQ%E6%88%AA%E5%9B%BE20190219122512
Just like this figure shows, the second number is the batch_idx, and the third number is len(trainloader). It prints the result every 10 batch_ids. Besides, the code is print(batch_idx+1) instead of print(batch_idx). However, as we can see, at least 25 (595-570) batch_ids are not printed. It is quite confusing.

Could you post the line of code printing this output?
Also, what is your batch size and how many samples does your Dataset contain?

Yes, the code is shown as

for batch_idx, (imgs, pids, _) in enumerate(trainloader):
‘’'some train code
if (batch_idx + 1) % args.print_freq == 0:
print(‘Epoch: [{0}][{1}/{2}]\t’
‘Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t’
‘Data {data_time.val:.4f} ({data_time.avg:.4f})\t’
‘Loss {loss.val:.4f} ({loss.avg:.4f})\t’.format(
epoch + 1, batch_idx + 1, len(trainloader), batch_time=batch_time,
data_time=data_time, loss=losses))

and the trainloader is defined as

trainloader = DataLoader(
ImageDataset(dataset.train, transform=transform_train),
sampler=RandomIdentitySampler(dataset.train, args.train_batch, args.num_instances),
batch_size=args.train_batch, num_workers=args.workers,
pin_memory=pin_memory, drop_last=True,
)

The batchsize is set to be 128, and there are about 150k samples in dataset.