"The kernel appears to have died. It will restart automatically" Error Untraceable

I am currently performing a multi-label classification. When working on a larger dataset, I am getting the message

Kernel have died it will restart automatically

Model Configuration:

class model_init(nn.Module):
      def __init__(self):
          super(model_init, self).__init__()
          resnet18=tmodels.resnet18(pretrained=True)
          resnet18_1=tmodels.resnet18(pretrained=True)
          resnet18_2=tmodels.resnet18(pretrained=True)
          self.m1=nn.Sequential(*(list(resnet18.children())[:-4]))
 
          self.m2=nn.Sequential(*(list(resnet18_1.children())[-4:-1]))
          self.m3=nn.Sequential(*(list(resnet18_2.children())[-4:-1]))
         
          self.m2_classifier=nn.Linear(512, 500, bias=True)
          self.m3_classifier=nn.Linear(512, 900, bias=True)
          
     def forward(self,img):
         
         x=self.m1(img)
         
         m2_feat=torch.squeeze(self.m2(x))
          m3_feat=torch.squeeze(self.m3(x))
         
         m2_prob=self.m2_classifier(m2_feat)
         m3_prob=self.m3_classifier(m3_feat)
         
         return m2_prob,m3_prob

The code runs fine on smaller datasets with #label 1=125 #label 2=250. The larger dataset has #label 1=500 #label 2=900. I have monitored the GPU, RAM and SWAP memory usage of my RT
X A5000 machine (24 GB) with 128 GB RAM. The GPU usage remain below 70%. RAM and swap memory are also nominal. The code crashes on the very first batch forward pass. Also number of workers is set to 10 on a 20 core processor. Reducing batch size up to 2 does not solve the problem.

gdb --args python3 script.py
(gdb) run
Has the following output,

Untitled

I have tried to localize where the problem is. But there is no such fixed location. The kernel sometime crashes on backward pass (loss.backward() step) and sometimes on the forward pass through the model.

Complete System Configuration:
Package                  Version
------------------------ -------------
absl-py                  1.4.0
anyio                    3.6.2
apturl                   0.5.2
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
asttokens                2.2.1
attrs                    22.2.0
backcall                 0.2.0
bcrypt                   3.2.0
beautifulsoup4           4.11.1
bellmanford              0.2.1
bleach                   5.0.1
blinker                  1.4
Brlapi                   0.8.3
cachetools               5.3.0
certifi                  2020.6.20
cffi                     1.15.1
chardet                  4.0.0
click                    8.0.3
colorama                 0.4.4
comm                     0.1.2
command-not-found        0.3
contourpy                1.0.7
cryptography             3.4.8
cupshelpers              1.0
cycler                   0.11.0
Cython                   0.29.33
dbus-python              1.2.18
debugpy                  1.6.5
decorator                4.4.2
defer                    1.0.6
defusedxml               0.7.1
distro                   1.7.0
distro-info              1.1build1
duplicity                0.8.21
einops                   0.6.0
entrypoints              0.4
executing                1.2.0
fasteners                0.14.1
fastjsonschema           2.16.2
filelock                 3.9.0
fonttools                4.38.0
fqdn                     1.5.1
future                   0.18.2
google-auth              2.16.2
google-auth-oauthlib     0.4.6
grpcio                   1.51.3
httplib2                 0.20.2
huggingface-hub          0.12.0
idna                     3.3
importlib-metadata       4.6.4
ipykernel                6.20.2
ipython                  8.8.0
ipython-genutils         0.2.0
isoduration              20.11.0
jedi                     0.18.2
jeepney                  0.7.1
Jinja2                   3.1.2
joblib                   1.2.0
jsonpointer              2.3
jsonschema               4.17.3
jupyter_client           7.4.9
jupyter_core             5.1.3
jupyter-events           0.6.3
jupyter_server           2.1.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments      0.2.2
keyring                  23.5.0
kiwisolver               1.4.4
language-selector        0.1
launchpadlib             1.10.16
lazr.restfulclient       0.14.4
lazr.uri                 1.0.6
lockfile                 0.12.2
louis                    3.20.0
macaroonbakery           1.3.1
Mako                     1.1.3
Markdown                 3.4.1
MarkupSafe               2.1.2
matplotlib               3.6.3
matplotlib-inline        0.1.6
mistune                  2.0.4
monotonic                1.6
more-itertools           8.10.0
nbclassic                0.4.8
nbclient                 0.7.2
nbconvert                7.2.8
nbformat                 5.7.3
nest-asyncio             1.5.6
netifaces                0.11.0
networkx                 2.5.1
notebook                 6.5.2
notebook_shim            0.2.2
numpy                    1.24.1
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
oauthlib                 3.2.0
olefile                  0.46
opencv-contrib-python    4.7.0.68
packaging                23.0
pandocfilters            1.5.0
paramiko                 2.9.3
parso                    0.8.3
pexpect                  4.8.0
pickleshare              0.7.5
Pillow                   9.0.1
pip                      22.0.2
platformdirs             2.6.2
prometheus-client        0.15.0
prompt-toolkit           3.0.36
protobuf                 4.22.0
psutil                   5.9.4
ptyprocess               0.7.0
pure-eval                0.2.2
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pycairo                  1.20.1
pycparser                2.21
pycups                   2.0.1
Pygments                 2.14.0
PyGObject                3.42.1
PyJWT                    2.3.0
pymacaroons              0.13.0
PyNaCl                   1.5.0
pyparsing                2.4.7
pyRFC3339                1.1
pyrsistent               0.19.3
python-apt               2.4.0
python-dateutil          2.8.2
python-debian            0.1.43ubuntu1
python-json-logger       2.0.4
pytz                     2022.1
pyxdg                    0.27
PyYAML                   5.4.1
pyzmq                    25.0.0
reportlab                3.6.8
requests                 2.25.1
requests-oauthlib        1.3.1
rfc3339-validator        0.1.4
rfc3986-validator        0.1.1
rsa                      4.9
scikit-learn             1.2.1
scipy                    1.10.0
screen-resolution-extra  0.0.0
SecretStorage            3.3.1
Send2Trash               1.8.0
setuptools               59.6.0
six                      1.16.0
sklearn                  0.0.post1
sniffio                  1.3.0
soupsieve                2.3.2.post1
ssh-import-id            5.11
stack-data               0.6.2
systemd-python           234
tensorboard              2.12.0
tensorboard-data-server  0.7.0
tensorboard-plugin-wit   1.8.1
terminado                0.17.1
threadpoolctl            3.1.0
timm                     0.6.12
tinycss2                 1.2.1
torch                    1.13.1
torchaudio               0.13.1
torchvision              0.14.1
tornado                  6.2
tqdm                     4.64.1
traitlets                5.8.1
typing_extensions        4.4.0
ubuntu-advantage-tools   27.12
ubuntu-drivers-common    0.0.0
ufw                      0.36.1
unattended-upgrades      0.1
uri-template             1.2.0
urllib3                  1.26.5
usb-creator              0.3.7
wadllib                  1.3.6
wcwidth                  0.2.6
webcolors                1.12
webencodings             0.5.1
websocket-client         1.4.2
Werkzeug                 2.2.3
wheel                    0.37.1
xdg                      5
xkit                     0.0.0
zipp                     1.0.0

Note that I have already explored solutions at the following links:
Kernel have died it will restart automatically
The kernel appears to have died. It will restart automatically
"The kernel appears to have died"- Segmentation fault
Kernel dies on loss.backward()

Your screenshot is unfortunately too small to decipher it.
Could you try to run the code on the CPU only and see if a proper error message would be raised?

I have uploaded a clearer version of the output.

Note that F0, F1 etc are output of my code.

The CPU exceution takes considerable time. Will post once available.

When Executed on CPU, code stops with Segmentation fault (core dumped) Message after one forward pass. @ptrblck

Could you check the backtrace in gdb and see where the segfault is raised from?

Please see the attched screenshot.

Thanks for the stacktrace (you can also post it by wrapping it into three backticks ``` for better readability). It doesn’t look familiar so could you post a minimal and executable code snippet to reproduce the issue, please?