Segmentation Fault on CPU

mohsenkiskani · January 29, 2020, 7:27pm

I am using PyTorch version 1.4.0 on Amazon EC2 and I am running the following simple command:

>>> num = 2133; import torch; A = torch.ones(10, num);  layer1 = torch.nn.Linear(num, 6); B = layer1(A); B.size()
torch.Size([10, 6])

which goes through successfully. However, if I increase the parameter num by one, I get a segmentation fault error and python terminates.

>>> num = 2134; import torch; A = torch.ones(10, num);  layer1 = torch.nn.Linear(num, 6); B = layer1(A); B.size()
Segmentation fault
[ec2-user@my_instance my_dir]$

I think that this happens because of a memory issue but the tensors I am using are not super large. These are memory specs of my system:

[ec2-user@my_instance my_dir]$ ipcs -l

------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767

Any idea how to avoid this error?

crowsonkb · January 29, 2020, 9:11pm

Running the line starting with num = 2134 results in a Python process that uses 63.7 MB of memory on my system. I think it is unlikely to be an out of memory error. I don’t know what the problem could be though.

Yaroslav_Bulatov · January 30, 2020, 12:19am

backtrace can help isolate the problem,

sudo service apport start
ulimit -Sc unlimited
<run the crashing script which dumps "core" into current directory>
gdb python
core core
bt