PyTorch Segementation Fault (core dumped) when moving tensor to GPU

I have to following program:

import torch

tensor_cpu = torch.tensor([1, 2, 3, 4, 5])

print("Tensor device before moving to CUDA:", tensor_cpu.device)

if torch.cuda.is_available():
    device = torch.device("cuda")          
    tensor_cuda = tensor_cpu.to(device)    
    print("Tensor device after moving to CUDA:", tensor_cuda.device)
else:
    print("CUDA is not available. Cannot move tensor to CUDA.")

The output is:

Tensor device before moving to CUDA: cpu
Segmentation fault (core dumped)

My setup is:
Nvidia RTX ADA 6000.
NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2

I use the official image:
nvcr.io/nvidia/pytorch:21.04-py3
I tried also other images, with newer CUDA versions:
22.12-py3, 23.04-py3

(I’m using PyTorch 1.x).
None helped. What could be the issue?

Could you check the stacktrace from gdb and post it here? Is any other CUDA application working, e.g. the CUDA samples?

Here is the gdb stacktrace:

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
--Type <RET> for more, q to quit, c to continue without paging--c
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(No debugging symbols found in python)
(gdb) 
(gdb) run
Starting program: /usr/bin/python main.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 411]
[New Thread 0x7f8a02f80700 (LWP 412)]
[New Thread 0x7f89fef7f700 (LWP 413)]
[New Thread 0x7f89f8f7e700 (LWP 414)]
[New Thread 0x7f89f2f7d700 (LWP 415)]
[New Thread 0x7f89ecf7c700 (LWP 416)]
[New Thread 0x7f89e6f7b700 (LWP 417)]
[New Thread 0x7f89def7a700 (LWP 418)]
[New Thread 0x7f89d8f79700 (LWP 419)]
[New Thread 0x7f89d4f78700 (LWP 420)]
[New Thread 0x7f89ccf77700 (LWP 421)]
[New Thread 0x7f89c6f76700 (LWP 422)]
[New Thread 0x7f89c2f75700 (LWP 423)]
[New Thread 0x7f89bcf74700 (LWP 424)]
[New Thread 0x7f89b6f73700 (LWP 425)]
[New Thread 0x7f89aef72700 (LWP 426)]
[New Thread 0x7f89aaf71700 (LWP 427)]
[New Thread 0x7f89a4f70700 (LWP 428)]
[New Thread 0x7f899ef6f700 (LWP 429)]
[New Thread 0x7f8998f6e700 (LWP 430)]
[New Thread 0x7f8990f6d700 (LWP 431)]
[New Thread 0x7f898af6c700 (LWP 432)]
[New Thread 0x7f8986f6b700 (LWP 433)]
[New Thread 0x7f8980f6a700 (LWP 434)]
[New Thread 0x7f897af69700 (LWP 435)]
[New Thread 0x7f8974f68700 (LWP 436)]
[New Thread 0x7f896ef67700 (LWP 437)]
[New Thread 0x7f8968f66700 (LWP 438)]
[New Thread 0x7f8962f65700 (LWP 439)]
[New Thread 0x7f895af64700 (LWP 440)]
[New Thread 0x7f8956f63700 (LWP 441)]
[New Thread 0x7f8950f62700 (LWP 442)]
[New Thread 0x7f894af61700 (LWP 443)]
[New Thread 0x7f8944f60700 (LWP 444)]
[New Thread 0x7f893ef5f700 (LWP 445)]
[New Thread 0x7f8938f5e700 (LWP 446)]
[New Thread 0x7f8932f5d700 (LWP 447)]
[New Thread 0x7f892af5c700 (LWP 448)]
[New Thread 0x7f8926f5b700 (LWP 449)]
[New Thread 0x7f8920f5a700 (LWP 450)]
[New Thread 0x7f8918f59700 (LWP 451)]
[New Thread 0x7f8912f58700 (LWP 452)]
[New Thread 0x7f890ef57700 (LWP 453)]
[New Thread 0x7f8908f56700 (LWP 454)]
[New Thread 0x7f8900f55700 (LWP 455)]
[New Thread 0x7f88fcf54700 (LWP 456)]
[New Thread 0x7f88f6f53700 (LWP 457)]
[New Thread 0x7f88eef52700 (LWP 458)]
[New Thread 0x7f88eaf51700 (LWP 459)]
[New Thread 0x7f88e2f50700 (LWP 460)]
[New Thread 0x7f88dcf4f700 (LWP 461)]
[New Thread 0x7f88d6f4e700 (LWP 462)]
[New Thread 0x7f88d0f4d700 (LWP 463)]
[New Thread 0x7f88caf4c700 (LWP 464)]
[New Thread 0x7f88c6f4b700 (LWP 465)]
[New Thread 0x7f88bef4a700 (LWP 466)]
[New Thread 0x7f88b8f49700 (LWP 467)]
[New Thread 0x7f88b2f48700 (LWP 468)]
[New Thread 0x7f88acf47700 (LWP 469)]
[New Thread 0x7f88a8f46700 (LWP 470)]
[New Thread 0x7f88a0f45700 (LWP 471)]
[New Thread 0x7f889af44700 (LWP 472)]
[New Thread 0x7f8894f43700 (LWP 473)]
[New Thread 0x7f8890f42700 (LWP 474)]
Tensor device before moving to CUDA: cpu
[New Thread 0x7f887f998700 (LWP 475)]
[New Thread 0x7f885ffff700 (LWP 476)]
--Type <RET> for more, q to quit, c to continue without paging--c

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007f8a6c03bd11 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1

When I try installing the cuda-samples from GitHub - NVIDIA/cuda-samples: Samples for CUDA Developers which demonstrates features in CUDA Toolkit, and running make to install I get many compliation errors (again, in the official image mentioned above), for example:

eduction_kernel.cu(595): error: class "cooperative_groups::__v1::thread_block_tile<64U, cooperative_groups::__v1::thread_block>" has no member "meta_group_rank"
          detected during:
            instantiation of "void multi_warp_cg_reduce<T,BlockSize,MultiWarpGroupSize>(T *, T *, unsigned int) [with T=double, BlockSize=128UL, MultiWarpGroupSize=64UL]" 
(1011): here
            instantiation of "void reduce(int, int, int, int, T *, T *) [with T=double]" 
(1034): here

reduction_kernel.cu(601): error: class "cooperative_groups::__v1::thread_block_tile<64U, cooperative_groups::__v1::thread_block>" has no member "meta_group_size"
          detected during:
            instantiation of "void multi_warp_cg_reduce<T,BlockSize,MultiWarpGroupSize>(T *, T *, unsigned int) [with T=double, BlockSize=128UL, MultiWarpGroupSize=64UL]" 
(1011): here
            instantiation of "void reduce(int, int, int, int, T *, T *) [with T=double]" 
(1034): here

67 errors detected in the compilation of "reduction_kernel.cu".
make[1]: *** [Makefile:365: reduction_kernel.o] Error 255

Are you seeing the build error in a current container or the old one?

In a new container I just created for testing this (with the image I’ve mentioned above).