Please Help me solve this error Process finished with exit code -1073741819 (0xC0000005)

My environment is python3.6.9 windows10 torch1.2.0. cuda9.2

My problem is when I running FC network the code works well in both CPU and GPU.
But when it comes to CNN, I can only train it on CPU. It raises an error when I try to train it on GPU.

like that: Process finished with exit code -1073741819 (0xC0000005)
I find the error raised when the code goes to loss.backword.

Please help me. Thank you a lot!

The error happened when I use the first column instead of the second.

device = torch.device(“cuda:0”)

device = torch.device(“cuda:0” if opt.cuda else “cpu”)

Finally, I figured out this.

This error occurs just because one of my variables is not loaded in cuda.

When I add this output = Variable(netD(real_cpu),requires_grad=True) the problem solved.

1 Like

Hi,

Out of curiosity, do you have custom cpp code in your backward?
Because pytorch should never raise such an unfriendly error message.
If you did not have custom code, can you provide a small code snippet to reproduce this please?

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )
    def forward(self, input):
        output = self.main(input)

        return output.view(-1, 1).squeeze(1)

Actually, the solution I provided won’t work, I have to use CPU train my network. This is the netD that cannot process backward bu GPU.
And the full code is here https://github.com/IkeYang/AndrewNg_ML_practise/blob/master/GAN
Futhermore, when I subsitute the conv layer with linear, it works on GPU.
That’s quite weird.

Which version of pytorch are you using?

It’s torch 1.2.0.
And my other environment is python3.6.9 windows10 cuda9.2

Could you remove the inplace=True arguments and run the code again? The error message is quite unhelpful.

EDIT: might also be related to this topic.

It won’t work too. Actually, the code works well until meeting the loss.backward(). And the gradient can go through the loss function but not the network.
Additionally, https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html this tutorial also raise the same error.

In the end, I switch my python to 3.5. And it works. Amazing… but my whole week is done. :joy:

I also ran into this problem with cu9.2, windows10 and python3.7, but switching to python3.5 using your method still didn’t work

Are you using an lr_scheduler?
https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.LambdaLR