How to compile a Python wheel containing an extension using a particular version

set-soft · June 9, 2023, 12:14am

Hi all!

I’m relative new to PyTorch.

I want to create a Python wheel for a PyTorch extension (in this case GitHub - qwopqwop200/GPTQ-for-LLaMa: 4 bits quantization of LLaMA using GPTQ). But I want to do it using a docker image and with a particular release of PyTorch.

In this particular case I want to compile this extension to be linked with the same libs used in PyTorch 1.13.1 + ROCm 5.2 + Python 3.9 release (Linux). This was installed using pip install mechanism.

I need to use 1.13.1 because 2.x doesn’t work on my Radeon RX5500XT board (memory faults for 2.0, 2.1 and a nightly snapshot I tried).

The above mentioned project contains a C++ code plus hipified kernels. I was able to compile the code using an official AMD docker image containing ROCm 5.2.3 + PyTorch 1.12.1 + Python 3.7. But I need to use the binary in a docker image that has PyTorch 1.13.1 + ROCm 5.2 + Python 3.9 installed. So this wheel is useless.

I want to avoid installing 2 versions of ROCm in the same docker image, so I want to recycle the ROCm libs included with PyTorch 1.3.1. I can use more than one ROCm to create the wheel in a GitHub workflow and then get the wheel as an artifact to install it in the final docker image, but I don’t want to create a 30 GB image with 2 ROCm versions, 2 PyTorch versions, 3 Python versions (like the official AMD images are doing).

I know the PyTorch project uploads docker images with the tools needed to create the PyTorch wheels. I think they are in dockerhub under pytorch/manylinux-builder.

But I’m confused about:

Which one to download. I see a lot of rocm5.2* tags. What’s the difference? Can I use any of them? Which one was used to generate the PyTorch 1.13.1 release?
Once I select a docker image: Should I install the PyTorch 1.13.1 + ROCm 5.2 wheels there using pip install?
How do I install Python 3.9? I think GitHub - pytorch/builder: Continuous builder and binary build scripts for pytorch is using conda to generate the different wheels, is this correct? Where is an example?

Sorry for the long question

set-soft · June 9, 2023, 1:24am

Ok, I think the following works:

docker pull pytorch/manylinux-builder:rocm5.2 I don’t know what the tags are, just downloaded the default
/opt/python/cp39-cp39/bin/pip3.9 install torch==1.13.1+rocm5.2 --extra-index-url https://download.pytorch.org/whl/rocm5.2 Each of the supported pythons are installed in /opt/python/VERSION, so you can execute 3.7, 3.8, 3.9, 3.10 or 3.11 as needed. With this I installed the release of PyTorch that I need for my target
cd /dockerx/TextUI/GPTQ-for-LLaMa/ This where I already cloned the extension.
/opt/python/cp39-cp39/bin/python setup_rocm.py bdist_wheel
The result is in dist/quant_cuda-0.0.0-cp39-cp39-linux_x86_64.whl. It contains support for all the AMD GPUs, I couldn’t figure out how to force ninja to just generate one target, HCC_AMDGPU_TARGET is ignored
I moved the wheel to the target environment and installed it using pip install /dockerx/TextUI/GPTQ-for-LLaMa/dist/quant_cuda-0.0.0-cp39-cp39-linux_x86_64.whl
I exported all the ROCm libs included in PyTorch wheel at system level:

$ for i in /usr/local/lib/python3.9/dist-packages/torch/lib/*.so; do ln -s $i /usr/lib/x86_64-linux-gnu/; done
$ for i in /usr/local/lib/python3.9/dist-packages/torch/lib/*.so; do ln -s $i /usr/lib/x86_64-linux-gnu/`basename $i`.0; done
$ ln -s /usr/local/lib/python3.9/dist-packages/torch/lib/libamdhip64.so /usr/lib/x86_64-linux-gnu/libamdhip64.so.5

I verified it works: python /dockerx/TextUI/GPTQ-for-LLaMa/test_kernel.py it bechmarked it:

Benchmarking LLaMa-7B FC2 matvec ...
FP16: 0.0034637656211853026
2bit: 0.001281311273574829
2bit: 0.001386340856552124 (faster)
3bit: 0.0009310600757598877
3bit: 0.001051180601119995 (faster)
4bit: 0.000908695936203003
4bit: 0.0009802179336547852 (faster)
8bit: 0.0007112977504730224

And then verified various cases, not sure how, but all the results seems to be the same for simulated and kernel. I guess it works.

Is this correct?