"Expected is_sm80 to be true, but got false."

Do the nightlies work correctly? We’re see this error when we train (without using the compiler):

Traceback (most recent call last):
  File "/root/src/transformers_custom/minGPT/build_model__openwebtext.py", line 89, in <module>
    main()
  File "/root/src/transformers_custom/minGPT/build_model__openwebtext.py", line 81, in main
    train(timestamp=timestamp)
  File "/root/src/transformers_custom/minGPT/build_model__openwebtext.py", line 69, in train
    trainer.train()
  File "/root/src/training/trainer.py", line 45, in train
    self.trainer_state.scaler.scale(loss).backward()
  File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected is_sm80 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

This is running on an NVIDIA A6000 card via the NVIDIA Docker + latest nightly pip install of torch:

FROM nvcr.io/nvidia/cuda:11.7.1-devel-ubuntu22.04

pip3 install --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

the sm_80 is listed below:

python -c "import torch;print(torch.__config__.show(), torch.cuda.get_device_properties(0))"

PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
 _CudaDeviceProperties(name='NVIDIA RTX A6000', major=8, minor=6, total_memory=48669MB, multi_processor_count=84)

We are using: pytorch ‘2.0.0.dev20230213+cu117’

I should also mention that this did work when using:

pip3 install --pre torchvision torch==2.0.0.dev20230201 --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

We also see a lot more DEBUG messages on the latest nightly… not sure how to turn them off.

A6000 has minor capability 6 which would fail the is_sm80 check; pytorch/fmha_api.cpp at 989299802cf83f8e3634b34028ecf08d76746307 · pytorch/pytorch · GitHub

Which I guess is failing based on the head dim check above. Could you post a code snippet that reproduces the issue? If you are being automatically dispatched to this broken path, this is a bug that should be raised upstream.

2 Likes

thanks! What does " A6000 has minor capability 6" mean?

I’ll put together a minimal example and add it to the thread.

It means that A6000 is actually sm86 (Compute Capability 8.6) which would fail the is sm80 check.

torch.cuda.get_device_properties(0)

_CudaDeviceProperties(name='NVIDIA RTX A6000', major=8, minor=6, total_memory=48669MB, multi_processor_count=84)

Gotcha - so the A100 has minor == 0, which is probably what everyone else is using.

I just opened a ticket here with an example for reproducing the error:

RuntimeError: Expected is_sm80 to be true, but got false

Thanks for your help!

1 Like