Which of the CPU implementations is used?

Good day!

I’m experimenting with simple networks and trying to figure out, which of the implementations was acutally used during training/inference.

I’m using simple net (embeddings->LSTM->Linear->Sigmoid), np.float32 data, and pytorch 1.9.1+cu111 (which was obtained through pip).

The source code browsing showed that a lot of functions has 3 implementations: xnnpack, MKL and MKLDNN (ideep). And torch.__config__.show() mentions that both MKL and MKLDNN are used.

The question is: what is the easiest way to find out, which of the CPU implementations (e.g. of Linear operation) actually took place during training or during inference? Is there any explicit priorities between these implementations? Or some debug mode which will allow me to see the stack?