Understanding pytorch intra op parallelism

I am trying to understand the application of pytorch’s intra op parallelism in the context of inference of a pytorch model. My question is: Is there use to pytorch intra op parallelism when model inference happens on the GPU? My thinking is that intra op parallelism gives great results when the inference takes place on the CPU. Any clarification in this regard will be of great help.
Thanks.