Repeatedly generating random variables causes segmentation fault

Elias_Vansteenkiste · January 6, 2018, 10:51pm

I repeatedly generate random variables in a training procedure.
After a number of iterations I get a segmentation fault.

I tried two approaches:

x = torch.FloatTensor(batch_size, 512, 1, 1).normal_()

and

x = torch.randn(batch_size, 512, 1, 1)

Both approaches produce segmentation faults.

Here is the trace produced with gdb:

Thread 16 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe21e4700 (LWP 7008)]
0x00007fffc193e027 in THRandom_random () from /home/elias/.local/lib/python3.5/site-packages/torch/lib/libTH.so.1
(gdb) bt
#0  0x00007fffc193e027 in THRandom_random () from /home/elias/.local/lib/python3.5/site-packages/torch/lib/libTH.so.1
#1  0x00007fffc193e07e in THRandom_random64 () from /home/elias/.local/lib/python3.5/site-packages/torch/lib/libTH.so.1
#2  0x00007fffc193e1da in THRandom_normal () from /home/elias/.local/lib/python3.5/site-packages/torch/lib/libTH.so.1
#3  0x00007fffc1656cc0 in THFloatTensor_normal () from /home/elias/.local/lib/python3.5/site-packages/torch/lib/libTH.so.1
#4  0x00007fffd1fb3e2f in THPFloatTensor_normal_ (self=0x7fff86b9cb88, args=<optimized out>, kwargs=<optimized out>) at /pytorch/torch/csrc/generic/TensorMethods.cpp:57624
#5  0x00000000004e9ba7 in PyCFunction_Call ()
#6  0x00000000005372f4 in PyEval_EvalFrameEx ()
#7  0x0000000000540199 in ?? ()
#8  0x000000000053bd92 in PyEval_EvalFrameEx ()
#9  0x00000000004ed367 in ?? ()
#10 0x0000000000537791 in PyEval_EvalFrameEx ()
#11 0x0000000000540f9b in PyEval_EvalCodeEx ()
#12 0x00000000004ebe37 in ?? ()
#13 0x00000000005c1797 in PyObject_Call ()
#14 0x000000000053920b in PyEval_EvalFrameEx ()
#15 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#16 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#17 0x0000000000540f9b in PyEval_EvalCodeEx ()
#18 0x00000000004ebd23 in ?? ()
#19 0x00000000005c1797 in PyObject_Call ()
#20 0x00000000004fb9ce in ?? ()
#21 0x00000000005c1797 in PyObject_Call ()
#22 0x0000000000534d90 in PyEval_CallObjectWithKeywords ()
#23 0x0000000000609c02 in ?? ()
#24 0x00007ffff7bc16ba in start_thread (arg=0x7fffe21e4700) at pthread_create.c:333
#25 0x00007ffff78f73dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Is this a bug in pytorch?

Nvidia driver version: 384.90
CUDA v8.0.44
pytorch version: 0.3.0.post4

Elias_Vansteenkiste · January 7, 2018, 9:56am

Using numpy to generate the random numbers works withouth any segmentation faults.

x = torch.from_numpy(np.random.normal(loc=0.0, scale=1.0, size=(batch_size, 512, 1, 1)))

SimonW · January 8, 2018, 1:06am

What is your batch_size? Are you generating random numbers in multithreading case?

Elias_Vansteenkiste · January 9, 2018, 11:19am

The batch size is 32.
I use the queue and threading packages for my custom data generator.

SimonW · January 12, 2018, 8:28pm

Yeah, our CPU rng isn’t thread safe at the moment. This is a high priority task tracked at https://github.com/pytorch/pytorch/issues/3794. The GPU one should work fine (and is faster). So numpy and GPU should be sufficient for now!

YossiB · February 13, 2018, 11:26am

I tried to use GPU alternative, but i couldn’t find one to randperm (only CPU pytorch implementation).
Should i use numpy ? or maybe this issue was solved in updated version of pytorch ?

SimonW · February 14, 2018, 3:53am

randperm is CPU only at the moment. Yeah the issue is already solved in master.