I want to analyze the PyTorch code implemented with AVX instructions

I’m trying to study the low level implementation of some computations in neural network. I built the PyTorch from source with following command.

DEBUG=1 NO_CUDA=1 python setup.py develop

I’ve succeeded to debug in C++ source level for torch.add function with cgdb. (I set the breakpoint at at::native::add function) But what I want to analyze is the SIMD instructions. I know PyTorch have added specialized AVX and AVX2 intrinsics for Tensor operations. AVX is what I’m interested in. (not AVX2) But I cannot go inside to that level.

  • aten/src/TH/vector/AVX.cpp
  • aten/src/ATen/cpu/vec256/*

I think the source files above are the candidates, but setting the breakpoints on some functions inside them failed. The code I’m using is the simple add, sub operation and the CNN codes in PyTorch tutorial. But both don’t implement the function in those source files.

I’m curious which PyTorch codes(written in Python) implement AVX instructions. And I want to know if I missed some options when I built it.

Actually, I’ve googled about it, but I cannot figured it out. A lot of advice with respect to this kind of topic deal with cmake, but I couldn’t understand it because I’m not well aware of it.

1 Like

This could be due to inlining (at least when AVX is detected).
Most of the action w.r.t. using AVX is going on in ATen’s native/cpu directory. This also uses the TensorIterator mechanism a lot. This is defined in the ATen native directory and provides a common infrastructure for plain+SIMD CPU and GPU ops and does most pointwise things and also reductions.
I would not look at the (Torch legacy) aten/src/TH* too much, as we’re working on moving things over to the fancy new way.

Best regards

Thomas

1 Like