Multi threaded ops on CPU

Hi, is an equivalent of tensorflow threadpool? Why it’s not used e.g. in torchvision cpu ops? Is there any doc how to use it?

I think ATen/Parallel.h is a reasonable interface for intra-op parallelism.

Regarding the ops: The extension ops implemented in torchvision are for FastRCNN/MaskRCNN: NMS, ROIAlign, ROIPool.

I do not have any special insight, but my impression from timing MaskRCNN-Benchmark with in the FBNet variant is that the ops implemented in torchvision itself represent a relatively small part of the overall calculation of a FastRCNN model, with the bulk of the time being spent in convolutions for the backbone. In addition, the CPU implementation seems to not be particularly optimized but mainly provided for completeness (and probably isn’t an urgent optimization target). If you were serious about optimizing it, you would likely want to look at vectorization (AVX2) before threading.

Best regards


Thank you @tom for the relevant header.
I mentioned torchvision, because I’m looking for reference example how to parallelize for loops in custom ops, and different variants of ROI transformations are typical examples. Of course there is some heavy stuff in the base model but I assume it’s tuned to the maximum. Just want to speed up post processing custom/opencv routines.