Time profiling for input feeding to model?

Hi, I am trying to assess how much time my code takes to acquire the input (with data augmentation). Currently I am using Pytorch’s dataloader. I started recording the time immediately after __getitem__ method till it returned a batch. Is it the right way to assess the time?
My code looks something like this:

def __getitem__(self, idx):
    ts = time.time()
print("time elapsed for inputs{}s".format(time.time() - ts))
   return batch

Your approach could work to profile the loading of a single sample.
However, if you want to profile the data loading via a DataLoader (and thus potentially multiple workers), I would recommend to use this code snippet from the ImageNet example.

single sample or a single batch?

The __getitem__ method is usually used to load a single sample using the passed index.
So your code snippet would profile the loading time of a single sample, while the ImageNet example would profile the complete DataLoader, i.e. the time could approach zero if the workers are preloading the next batches fast enough while your GPU is busy.

Right. So improving the loading speed of a single example correlates to speeding up the whole batch speed and subsequently training? I tried that ImageNet example, but I am confused between data time and batch time. So according to example:

ts = time.time()
for iter, batch in enumerate(train_loader):
data_time.update(time.time() - ts)

outputs = fcn_model(inputs)
labels = labels.type_as(outputs)
loss = criterion(outputs, labels)
batch_time.update(time.time() - ts)

So data time is the time taken by dataloader to load a batch? and batch time is time taken by model to process a batch? I think I can only improve the efficiency of data loading, not batch processing (batch time) because that depends on model, right?

Yes, your statement is correct.
The first iteration of the DataLoader would be slower since all workers are loading a complete batch. If the data loading is not a bottleneck, the data loading time should decrease towards zero.

To accelerate the model, you could use e.g. mixed-precision training and check if this would yield a speedup.
Also, torch.backends.cudnn.benchmark = True would enable cudnn to profile the kernels for each new input shape and could accelerate the training.