Is it important for an (image-based) dataloader to return a torch.Tensor instead of a numpy array?

The datasetmapper of detectron has the following interesting comment in the codebase: (for the function returning the inference / train dataset)

    # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
    # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
    # Therefore it's important to use torch.Tensor.

The comments is just above the following line of code:

dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))

Is there some source for this statement? Can somebody explain possible reasons for this? How could I test this?

I would guess the comment might refer to this issue and description.

Thanks for your answer, @ptrblck .

I am not sure if this refers to the same problem. I think the issue you mentioned refer to using Python object for the member variable of the data loader. I think the same issue was discussed here: Demystify RAM Usage in Multi-Process Data Loaders - Yuxin's Blog

The comment in the detectron codebase to me rather seems about returning python/numpy objects vs pytorch objects from the dataloader, or? Maybe I should ask directly on the detectron github page…

I might be wrong and the linked issue was the only one which sounded familiar. Yes, this sounds like a good idea and please post the answer here in case you get one as I would also be interested to learn more about their concerns.

An very interesting answer to the question was posted, which I repeat here:

Large tensors are better returned as torch.Tensor than numpy arrays. Because torch.Tensor are pickled through shared memory pytorch/ at fd3a7264ae80332ba5ec8f60446e6d7a2c2276c1 · pytorch/pytorch · GitHub while numpy arrays are not.

The answer can be found here.