Multiprogramming and Pytorch Profilers

I’m trying to profile the CPU time for dataset preproressing (augmentation) and dataloaders.

Dataloaders are in most environments used in multiprogramming mode, in which many workers in sub-processes fetch input data. Meanwhile torch.autograd.profiler allows profiling for multiprogrammed environments but only keeps track of the the main thread. It does not profile the execution of the child processes do prevent mixed results, according to the documentation of torch profiler.

Is there a way to get the individual profiles of the child data fetching processes? If possible, I would like to find out the average performance of the child processes which will likely be the performance of the whole dataloader.

I think that a mechanism for a child process to return its profile object to the main process upon termination could come in handy. I’ve seen worker_init_fn but no worker_end_fn in torch/utils/data/dataloader.py. Is there anything of the kind? Or should I just get dirty and modify the code by myself?