adsharma
(Arun Sharma)
June 11, 2024, 5:05pm
1
I’m debugging performance problems with an application related to advanced tensor indexing in the autograd engine. Was unable to come up with a minimal repro I can share here.
One of the recent commits sounds promising based on what I see in the profile. I’d like to test if it really fixes the performance problem.
With some effort I could get a nightly builds of torch + torchvision installed. It appears that if I pick torch from date N, I need to go with torchvision nightly from date N+1 to be compatible. However, I couldn’t figure out how to find compatible versions of xformers and pytorch3d (other libs the app uses). Any hints?
Also, I learned that all these nightlies support only python-3.12 on linux. No support for python-3.10. Is that accurate?
You can just copy/paste the install command from here to install the nightly binaries without worrying about tagging the right versions.
No, nightly binaries support Python 3.8-3.12.
adsharma
(Arun Sharma)
June 13, 2024, 10:39pm
3
Thanks. Trying it out now.
However installing xformers brings in torch-2.3.0 and cu121. I don’t know what happens when multiple cuda versions and torch versions are in play.
Is this the recommended way to install xformers from source to work with pytorch nightly + cu124?
opened 09:35AM - 12 Jan 24 UTC
I have seen many users in the community encounter compilation and build problems… , and many failures, especially when using new versions of CUDA, PyTorch, or wanting to update CUTLASS or Flash Attention.
So I wrote a tutorial on how to use Docker (Nvidia's monthly release, including the latest CUDA, Torch, etc.) to fully install and build xFormers.
## guide
- https://soulteary.com/2024/01/12/xformers-source-code-compilation-with-nvidia-docker.html
- (backup) https://zhuanlan.zhihu.com/p/677516241
- dockerfile example: https://github.com/soulteary/docker-stable-diffusion-webui/blob/main/docker/Dockerfile.xformers
I believe that people who have encountered the same problem should be able to complete the construction by drawing on the ideas in the article.
If you like reading in English, I recommend using Google Translate to translate the article.
However, if you don’t translate it, but refer to the command line in the article, it should be enough. I wish everyone good luck and hope it can save time.
## main steps
download complete source code, includes xformers, flash attention, cutlass...
```bash
git clone --recursive https://github.com/facebookresearch/xformers.git --depth 1
```
update 3rd-party source code.
```bash
# we can update to latest
cd xformers/third_party/flash-attention
git pull origin main
# update to flash attention match version (optional)
# cd xformers/third_party/cutlass
# git pull origin main
# git checkout v3.3.0
```
set git config
```bash
git config --global --add safe.directory /app/xformers
git config --global --add safe.directory /app/xformers/third_party/flash-attention
git config --global --add safe.directory /app/xformers/third_party/cutlass
```
install build deps:
```bash
pip install ninja
```
enjoy
```bash
pip install -v -e .
```
final, `python -m torch.utils.collect_env`
```bash
xFormers 0.0.24+6600003.d20240112
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@v2.3.6: available
memory_efficient_attention.flshattB@v2.3.6: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.2.0a0+81ea7a4
pytorch.cuda: available
gpu.compute_capability: 8.9
gpu.name: NVIDIA GeForce RTX 4090
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1230
build.python_version: 3.10.12
build.torch_version: 2.2.0a0+81ea7a4
build.env.TORCH_CUDA_ARCH_LIST: 5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.3.107
source.privacy: open source
```
@danthe3rd , hope it helps, and maybe you can fixed this topic, i think there're many people failed on build from source.