The problem is,
when I use ‘cuda’, python code just ‘finish’ after loss.backward() line.
I made a simple code, and I found that all of my python codes make a problem, which I used happily until yesterday.
path = 'C:/data_main_2d' epochs = 5 batch_size = 2 lr = 1e-1 device = torch.device('cpu') # 'cuda' if torch.cuda.is_available() else dataset = dset.ImageFolder(root=path, transform=transforms.Compose([ transforms.Grayscale(), transforms.ToTensor(), ])) dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0) model = UNet(n_channels=1, n_classes=1).to(device=device) criterion = nn.MSELoss() optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-8) for epoch in range(epochs): for data in dataloader: img, _ = data img = Variable(img).to(device) output = model(img) loss = criterion(output, img) optimizer.zero_grad() print('hi') loss.backward() optimizer.step() loss_data = loss.data print(loss_data)
If I make a device as a ‘cpu’, it works and goes epochs. However, if the device is ‘cuda’, the result is like that
(base) D:\My Drive\(0).BSMSvac\0.ML\>python train2.py hi (base) D:\My Drive\(0).BSMSvac\0.ML\>
My cuda version is 9.2
torch is 1.4.0
torchvision is 0.5.0
I am so confusing because I did ML for 1 month actively, and my code works well…
But yesterday I found the problem that terminal finish (ends) with no error or warnings.
Also, now all other codes also don’t work. I can’t understand what is the problem…
Other codes are not the UNet, sequential Convolution network, so UNet is not the problem…