Loss.backward() with 'cuda' just ends

JayHan · March 3, 2020, 4:58am

The problem is,
when I use ‘cuda’, python code just ‘finish’ after loss.backward() line.
I made a simple code, and I found that all of my python codes make a problem, which I used happily until yesterday.

path = 'C:/data_main_2d'
epochs = 5
batch_size = 2
lr = 1e-1
device = torch.device('cpu') # 'cuda' if torch.cuda.is_available() else 

dataset = dset.ImageFolder(root=path,
                           transform=transforms.Compose([
                               transforms.Grayscale(),
                               transforms.ToTensor(),
                           ]))
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0)

model = UNet(n_channels=1, n_classes=1).to(device=device)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-8)

for epoch in range(epochs):
    for data in dataloader:
        img, _ = data
        img = Variable(img).to(device)

        output = model(img)
        loss = criterion(output, img)

        optimizer.zero_grad()
        print('hi')
        loss.backward()
        optimizer.step()

    loss_data = loss.data
    print(loss_data)

If I make a device as a ‘cpu’, it works and goes epochs. However, if the device is ‘cuda’, the result is like that

(base) D:\My Drive\(0).BSMSvac\0.ML\>python train2.py
hi

(base) D:\My Drive\(0).BSMSvac\0.ML\>

My cuda version is 9.2
torch is 1.4.0
torchvision is 0.5.0

I am so confusing because I did ML for 1 month actively, and my code works well…
But yesterday I found the problem that terminal finish (ends) with no error or warnings.
Also, now all other codes also don’t work. I can’t understand what is the problem…
Other codes are not the UNet, sequential Convolution network, so UNet is not the problem…

JayHan · March 3, 2020, 8:26am

I solved this problem as

deleting cuda in my computer and install again…

Couldn’t know the exact reason and solution but it works well now.