Conv3d Problem: SIGSEGV(Signal 11)

11117 · September 21, 2017, 12:26pm

Here I meet a problem when use Conv3d to process a [32,32,32] pixel sized cube,and here is the code:

layer1=nn.Sequential(
nn.AvgPool3d((2,1,1),stride=(2,1,1))
)
layer2_1=nn.Sequential(
nn.Conv3d(1,64,(3,3,3),(1,1,1),(1,1,1)),
nn.LeakyReLU(RL_leakyrate),
)

Here my input x is a Variable with data type: torch.tensor.double,and the size is [4,1,32,32,32].

Then I try 2 operations:

out=layer1(x)

The step is normal and returns a [4,1,16,32,32] size’s tensor with type double ,and I try the next operation:

out2=layer2_1(out)

And it returns the error message:

“Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).”

Then I try to debug step by step, and I notice that in the source code “functional.py”, function"conv3d", line 116:

return f(input,weight,bias)

When I run this code, the program exit with all variables disappeared. I’m sure nobody has raised this question before, so I raise the problem and find some help here.

If more information is needed, please tell me, and I will show the details as clear as possible.

Thanks

smth · September 26, 2017, 3:05am

Hi Phil,

I tried this script on pytorch version 0.2.0 and on pytorch master branch. It ran without errors on both.
Can you tell me about what machine you are running this on, is it an old computer?

import torch
import torch.nn as nn
from torch.autograd import Variable
layer1=nn.Sequential(
    nn.AvgPool3d((2,1,1), stride=(2,1,1))
)
layer2_1=nn.Sequential(
    nn.Conv3d(1,64,(3,3,3),(1,1,1),(1,1,1)),
    nn.LeakyReLU(0.1),
)

layer1.double()
layer2_1.double()

inp = Variable(torch.Tensor(4,1,32,32,32).double())
out=layer1(inp)
out2=layer2_1(out)


layer1.cuda()
layer2_1.cuda()

inp = Variable(torch.Tensor(4,1,32,32,32).double().cuda())
out=layer1(inp)
out2=layer2_1(out)

11117 · October 20, 2017, 4:29am

Thanks for reply! Here I find that your code also return the same error.
When I run your code in this way:

import torch
import torch.nn as nn
from torch.autograd import Variable
layer1=nn.Sequential(
    nn.AvgPool3d((2,1,1), stride=(2,1,1))
)
layer2_1=nn.Sequential(
    nn.Conv3d(1,64,(3,3,3),(1,1,1),(1,1,1)),
    nn.LeakyReLU(0.1),
)

layer1.double()
layer2_1.double()
inp = Variable(torch.Tensor(4,1,32,32,32).double())
out=layer1(inp)
out2=layer2_1(out)

The same bug reappears exactly in this step,and is completely the same as above I mentioned:

out2=layer2_1(out)

And I also try

out=layer2_1(inp)

and I move the nn.LeakyRelu(0.1)part of layer2_1 and try

out=layer2_1(inp)

The same mistake reappears exactly the same as above.

I haven’t try the code with .cuda(),maybe it is a mistake only on pytorch of cpu version? Or maybe I need to reinstall my pytorch system?

I’m sorry for the late apply, thanks for your attention!

SimonW · October 20, 2017, 6:59am

If possible, could you try the current master and see if the problem still persists? Thanks!

11117 · October 20, 2017, 1:05pm

Thanks for reply!
Here my version is 0.2.0_1,and the problem still exists.
My computer doesn’t have GPU,so I just try the code with the cpu version
Do you have some advises?
= = =
I have tried 2 machines:

Computer A without GPU and install CPU version 0.2.0_3
Computer B with GPU and install GPU version 0.2.0_3
Then I run the codes above,and here is the result:
In computer A, the error reappears every time( after every reinstall)
In computer B, I try the code with.cuda() and without ```.cuda()````, and the error vanishes,and the code runs normally.

So I think it is a bug in CPU version with a no-GPU computer ,which needs to be fixed.

11117 · October 20, 2017, 1:08pm

Well, I’m use Ubuntu 16, and my computer is a new computer without GPU, so my code runs just on the cpu version with pytorch version 0.2.0_1

11117 · October 20, 2017, 3:44pm

I have tried 2 machines:

Computer A without GPU and install CPU version 0.2.0_3
Computer B with GPU and install GPU version 0.2.0_3
Then I run the codes above,and here is the result:
In computer A, the error reappears every time( after every reinstall)
In computer B, I try the code with.cuda() and without ```.cuda()````, and the error vanishes,and the code runs normally.

So I think it is a bug in CPU version with a no-GPU computer ,which needs to be fixed.

SimonW · October 20, 2017, 4:10pm

Well, if possible, could you try the latest master code (not just 0.2, try a manual install from current github master)? I don’t have a non-gpu machine by hand. Thanks!

richard · October 20, 2017, 6:54pm

@11117 I ran your code (the following) on a machine without a gpu. My torch.version is 0.2.0_4 and no errors occurred. Could you update your pytorch or build from master and try again?

import torch
import torch.nn as nn
from torch.autograd import Variable
layer1=nn.Sequential(
    nn.AvgPool3d((2,1,1), stride=(2,1,1))
)
layer2_1=nn.Sequential(
    nn.Conv3d(1,64,(3,3,3),(1,1,1),(1,1,1)),
    nn.LeakyReLU(0.1),
)

layer1.double()
layer2_1.double()
inp = Variable(torch.Tensor(4,1,32,32,32).double())
out=layer1(inp)
out2=layer2_1(out)

11117 · October 21, 2017, 8:13am

Amazing!
I update my version into 0.2.0_4 and the bug vanishes (using the source code in github)
Thanks a lot!

11117 · October 21, 2017, 8:16am

Amazing!
I try your way and install the latest version from the current github, then the bug vanishes!
(But the install page of pytorch still stays in version 0.2.0_3, in which the bug still stays, should we update it?)
Thanks a lot! Now I can go on recommending pytorch in our lab!

SimonW · October 24, 2017, 8:48pm

0.2.0_3 is still our latest release. It will be updated when we release a newer version! There are a lot of new stuff and fixes/improvements in master though. Feel free to try them!