I’m using an instance of DataLoader with num_workers
> 0. I noticed that even with a small number of workers the main process becomes a bottleneck: it can’t absorb the data fast enough. A quick glance suggests that the problem is the serialization overhead between processes. Interestingly enough, the main process is CPU-bound and not I/O bound.
Are there any options to reduce the overhead of serialization? E.g. is there an option to use Apache Arrow for zero-copy data transport?