Pytorch 2.1.0a0 with Ubuntu 20.04, cuda 11.8 build failed

Hi,

I am trying building pytorch 4805441b4a582b140a408f864403ab45680c8131 (2023.3.18 version) on my environment with GPU driver version 470.161.03 and docker nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04, but it failed at:

[5996/6694] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k128.cu.o
FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k128.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_CUDA -DUSE_EXPERIMENTAL_CUDNN_V8_API -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_NCCL -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -I/root/Documents/pytorch/build/aten/src -I/root/Documents/pytorch/aten/src -I/root/Documents/pytorch/build -I/root/Documents/pytorch -I/root/Documents/pytorch/cmake/../third_party/benchmark/include -I/root/Documents/pytorch/third_party/onnx -I/root/Documents/pytorch/build/third_party/onnx -I/root/Documents/pytorch/third_party/foxi -I/root/Documents/pytorch/build/third_party/foxi -I/root/Documents/pytorch/aten/src/THC -I/root/Documents/pytorch/aten/src/ATen/cuda -I/root/Documents/pytorch/aten/src/ATen/../../../third_party/cutlass/include -I/root/Documents/pytorch/build/caffe2/aten/src -I/root/Documents/pytorch/aten/src/ATen/.. -I/root/Documents/pytorch/build/nccl/include -I/root/Documents/pytorch/c10/cuda/../.. -I/root/Documents/pytorch/c10/.. -I/root/Documents/pytorch/torch/csrc/api -I/root/Documents/pytorch/torch/csrc/api/include -isystem=/root/Documents/pytorch/cmake/../third_party/googletest/googlemock/include -isystem=/root/Documents/pytorch/cmake/../third_party/googletest/googletest/include -isystem=/root/Documents/pytorch/third_party/protobuf/src -isystem=/root/anaconda3/include -isystem=/root/Documents/pytorch/third_party/gemmlowp -isystem=/root/Documents/pytorch/third_party/neon2sse -isystem=/root/Documents/pytorch/third_party/XNNPACK/include -isystem=/root/Documents/pytorch/third_party/ittapi/include -isystem=/root/Documents/pytorch/cmake/../third_party/eigen -isystem=/usr/local/cuda/include -isystem=/root/Documents/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem=/root/Documents/pytorch/third_party/ideep/include -isystem=/root/Documents/pytorch/third_party/ideep/mkl-dnn/include -isystem=/root/Documents/pytorch/cmake/../third_party/cudnn_frontend/include -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_89,code=compute_89 -gencode arch=compute_90,code=compute_90 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda  -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__  -g -lineinfo --source-in-ptx -Xcompiler=-fPIC -DTH_HAVE_THREAD -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-function,-Wno-unused-result,-Wno-missing-field-initializers,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-error=deprecated-declarations,-Wno-missing-braces,-Wno-maybe-uninitialized -std=c++17 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k128.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k128.cu.o.d -x cu -c /root/Documents/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k128.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k128.cu.o
Segmentation fault (core dumped)

I tried to find out whether this commit passed CI on this kind of environment on https://github.com/pytorch/pytorch/actions/runs/4259194535, but it seems this kind of environment(Ubuntu 20.04, cuda 11.8, cudnn8) is not tested by CI. Is there a reason why this environment is not tested? And should I change to another environment in order to build pytorch?

Thanks in advance.