Please Help me solve this error Process finished with exit code -1073741819 (0xC0000005)

_ike · October 11, 2019, 5:21am

My environment is python3.6.9 windows10 torch1.2.0. cuda9.2

My problem is when I running FC network the code works well in both CPU and GPU.
But when it comes to CNN, I can only train it on CPU. It raises an error when I try to train it on GPU.

like that: Process finished with exit code -1073741819 (0xC0000005)
I find the error raised when the code goes to loss.backword.

Please help me. Thank you a lot!

The error happened when I use the first column instead of the second.

device = torch.device(“cuda:0”)

device = torch.device(“cuda:0” if opt.cuda else “cpu”)

_ike · October 11, 2019, 7:02am

Finally, I figured out this.

This error occurs just because one of my variables is not loaded in cuda.

When I add this output = Variable(netD(real_cpu),requires_grad=True) the problem solved.

albanD · October 12, 2019, 10:31pm

Hi,

Out of curiosity, do you have custom cpp code in your backward?
Because pytorch should never raise such an unfriendly error message.
If you did not have custom code, can you provide a small code snippet to reproduce this please?

_ike · October 13, 2019, 3:48am

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )
    def forward(self, input):
        output = self.main(input)

        return output.view(-1, 1).squeeze(1)

Actually, the solution I provided won’t work, I have to use CPU train my network. This is the netD that cannot process backward bu GPU.
And the full code is here https://github.com/IkeYang/AndrewNg_ML_practise/blob/master/GAN
Futhermore, when I subsitute the conv layer with linear, it works on GPU.
That’s quite weird.

albanD · October 13, 2019, 11:18pm

Which version of pytorch are you using?

_ike · October 14, 2019, 5:18am

It’s torch 1.2.0.
And my other environment is python3.6.9 windows10 cuda9.2

ptrblck · October 14, 2019, 7:39pm

Could you remove the inplace=True arguments and run the code again? The error message is quite unhelpful.

EDIT: might also be related to this topic.

_ike · October 15, 2019, 3:38pm

It won’t work too. Actually, the code works well until meeting the loss.backward(). And the gradient can go through the loss function but not the network.
Additionally, https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html this tutorial also raise the same error.

_ike · October 23, 2019, 2:19pm

In the end, I switch my python to 3.5. And it works. Amazing… but my whole week is done.

KeyunYang · November 25, 2019, 7:01am

I also ran into this problem with cu9.2， windows10 and python3.7, but switching to python3.5 using your method still didn’t work

J_Johnson · April 14, 2021, 4:07pm

Are you using an lr_scheduler?
https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.LambdaLR