Tensor.item() takes a lot of running time

Hi,

I am facing the same issue. In my case, CUDA_LAUNCH_BLOCKING=1 reduces the time taken by .item() but increases the forward+backward pass time by an equal amount, so total training time remains same.

I also came across Synchronization slow down caused by .item() which is not caused by .data[0], which suggests that using .data[0] instead of .item() is faster, but in my case .data[0] does not take time but it increases the time taken by data loading. So again, total training time remains same.

Any other suggestions/fixes that I can try?

EDIT: For my example. num_workers = 8, batch_size = 32. Modifying num_workers only impacts the data loading time, as expected.

EDIT 2: The increase in data loading time when I use .data[0] instead of .item() is occurring when I move images to GPU (images.cuda()) after dataloader returns a tensor containing 32 images.

1 Like