How to use multi-thread in pytorch dataloader with pybind release-gil functions?

Half of my project written in C++ with pybind nogil wrapper and unpickleable c++ class, so I can only use multi-thread loader instead of multi-process.
Does pytorch implement multi-thread for loader? If not, why?

No, because of the Python GIL which would block the threads and thus wouldn’t yield any speedup.

c++ function with pybind11::gil_scoped_release can avoid python gil problem and get ultra-fast multi-thread dataloader without pickle/IPC overhead if all bottleneck functions in dataloader code are written in c++. I think we should add a option to use thread workers.

Would this mean that the standard approach of using e.g. ImageFolder or a custom Dataset wouldn’t be suitable for this approach?
If so, I think it might be a great extension (so feel free to create a feature request on GitHub for it and explain your interest in implementing it).

General reason is multithreading is not fast in python. For the use case of DataLoader, most of users create Dataset using pure python code. Then, there won’t be any performance gain if we provide that.

But, we are working on new DataLoader-DataPipe, and hopefully we can provide multithreading DataLoader for you in the short future. And, you can use that to call a c++ function without python gil.