Hi there. I’m trying to run Adversarial Example Generation — PyTorch Tutorials 1.8.1+cu102 documentation. Everything works fine for first 651 pictures and then I’m getting segmentation fault error. I checked memory usage on GPU(GTX 1050) and seems fine. I also run the same code on my friend’s GTX 1050ti and it worked fine. I re-install Ubuntu and clean set up of driver and CUDA tools and the problem is still there. I execute the code with GNU debugger and here is what I’ve got after 651 pictures:
Thread 12 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffedcdb5700 (LWP 8697)]
0x00007fff32aa46ca in std::_Hashtable<at::native::ConvolutionParams, std::pair<at::native::ConvolutionParams const, cudnnConvolutionBwdDataAlgoPerf_t>, std::allocator<std::pair<at::native::ConvolutionParams const, cudnnConvolutionBwdDataAlgoPerf_t> >, std::__detail::_Select1st, at::native::ParamsEqual<at::native::ConvolutionParams>, at::native::ParamsHash<at::native::ConvolutionParams>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node(unsigned long, at::native::ConvolutionParams const&, unsigned long) const ()
from /home/muco/.local/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so
$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
$nvidia-smi
Thu May 20 16:12:04 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80 Driver Version: 460.80 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:01:00.0 Off | N/A |
| N/A 57C P0 N/A / N/A | 1123MiB / 4040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 849 G /usr/lib/xorg/Xorg 204MiB |
| 0 N/A N/A 1332 G budgie-wm 23MiB |
| 0 N/A N/A 1673 G ...AAAAAAAAA= --shared-files 58MiB |
| 0 N/A N/A 6801 G ...AAAAAAAAA= --shared-files 61MiB |
| 0 N/A N/A 8679 C /usr/bin/python3 769MiB |
CuDNN Version: 8.1.0
Ubuntu Version: 20.04
GCC Version: 9.3.0
Python Version: 3.8.5
I’m wondering is this a hardware problem or some kind of bug?