Help moving code to run on a GPU - Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_nll_loss_forward

AnswersPls · October 29, 2020, 4:19am

import torch
from torchvision import datasets, models, transforms
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

train_transform = transforms.Compose([
                                transforms.Resize(256),
                                transforms.RandomResizedCrop(224),
                                transforms.RandomHorizontalFlip(),
                                transforms.ToTensor(),
                                transforms.Normalize(mean, std)])

test_transform = transforms.Compose([
                                transforms.Resize(256),
                                transforms.CenterCrop(224),
                                transforms.ToTensor(),
                                transforms.Normalize(mean, std)])

#we perform these on the training data, test data is not randomly flipped or cropped

pwd

'C:\\Users\\cam12\\datasets'

data_dir = 'datasetsToUse/HPV/'

#holds training and test data
image_datasets ={}

#image folder allows us to load images and apply a series of transformations on the, read everything from train and apply 
#transforms etc 
image_datasets['train']= datasets.ImageFolder(data_dir + '/HPVTraining', train_transform)

image_datasets['test']= datasets.ImageFolder(data_dir + '/HPVVal', test_transform)

#both stored as dictionaries
print("Training data size - %d" %  len(image_datasets['train']))
print("Test data size - %d" %  len(image_datasets['test']))

Training data size - 24520
Test data size - 5996

class_names = image_datasets['train'].classes
class_names

['HPV+', 'HPV-']

image_datasets['train'].class_to_idx
#This shows that HPV+ is a 0 and HPV positive is a 1

{'HPV+': 0, 'HPV-': 1}

dataloaders ={}

dataloaders['train'] = torch.utils.data.DataLoader(image_datasets['train'],
                                                   batch_size=8,
                                                   shuffle=True,
                                                   num_workers=4)

dataloaders['test'] = torch.utils.data.DataLoader(image_datasets['test'],
                                                  batch_size=8,
                                                  shuffle=True,
                                                  num_workers=4)

dataloaders

{'train': <torch.utils.data.dataloader.DataLoader at 0x27fe824e3a0>,
 'test': <torch.utils.data.dataloader.DataLoader at 0x27fe824ee50>}

inputs, labels = next(iter(dataloaders['train']))

inputs.shape

torch.Size([8, 3, 224, 224])

labels
#we have 2 folders in our training folder, 0 and 1

tensor([0, 0, 0, 0, 0, 0, 0, 0])

import torchvision
#make grid to view using matplotlib, stack side by side
inp = torchvision.utils.make_grid(inputs)

inp.shape
#include padding

torch.Size([3, 228, 1810])

inp.max()
#maxiumum value for RGB is 2.64

tensor(2.5354)

import numpy as np

np.clip(inp.cpu(), 0, 1).max()

tensor(1.)

inp.numpy().transpose((1, 2, 0)).shape

(228, 1810, 3)

import matplotlib.pyplot as plt

plt.ion()

model = models.resnet18(pretrained=True)
model.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

#add last linear layer that works with our data, need number of input features
num_ftrs = model.fc.in_features
num_ftrs

import torch.nn as nn

#takes in 512 features as extracted above, and it ouputs to 2, + or -
#This layer replaces the layer in the pretrained model
model.fc = nn.Linear(num_ftrs, 2)

criterion = nn.CrossEntropyLoss()
criterion.to(device)

CrossEntropyLoss()

import torch.optim as optim

optimizer = optim.SGD(model.parameters(), 
                      lr=0.001, 
                      momentum=0.9)

from torch.optim import lr_scheduler

exp_lr_scheduler = lr_scheduler.StepLR(optimizer, 
                                       step_size=7, 
                                       gamma=0.1)

def calculate_accuracy(phase, running_loss, running_corrects):

    epoch_loss = running_loss / len(image_datasets[phase])
    epoch_acc = running_corrects.double() / len(image_datasets[phase])

    print('{} Loss: {:.4f} Acc: {:.4f}'.format( phase, epoch_loss, epoch_acc))
    
    return (epoch_loss, epoch_acc)

def phase_train(model, criterion, optimizer, scheduler):
    
    
    model.train()
    running_loss = 0.0
    running_corrects = 0
    
    #setup for training, loop through inputs, zero gradients, enable for training
    #run predictions and extract predicted values, calculate loss
    #run backward and update optimiser
    for inputs, labels in dataloaders['train']:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        
        with torch.set_grad_enabled(True):
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, labels)
            
            loss.backward()
            optimizer.step()
            scheduler.step()
            
        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels.data)
    
    calculate_accuracy('train', running_loss, running_corrects)

#We are going to train the model and evaluate the accuracy,
#we will only save params that give best accuracy 
import copy

best_acc = 0.0

def phase_test(model, criterion, optimizer):
    
    model.eval()
    running_loss = 0.0
    running_corrects = 0
    global best_acc
    #keeps track of model weights that predicts best accuracy
    
    for inputs, labels in dataloaders['test']:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        #run with no grads and zero grads before 
        with torch.no_grad():
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, labels)
        #get predicted output and caclulate loss
        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels.data)
        
    epoch_loss, epoch_acc = calculate_accuracy('test', running_loss, running_corrects)
    #If we get a more accurate result - store the model params in best model wts 
    if epoch_acc > best_acc:
        best_acc = epoch_acc
        best_model_wts = copy.deepcopy(model.state_dict())
        
    return best_model_wts

def build_model(model, criterion, optimizer, scheduler, num_epochs=10):
    #will be updated if we getrr a better accuracy
    best_model_wts = copy.deepcopy(model.state_dict())

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)
        
        phase_train(model, criterion, optimizer, scheduler)
        best_model_wts = phase_test(model, criterion, optimizer)
        #once we train on an epoch - we update our model wts
        print()
    
    print('Best test Acc: {:4f}'.format(best_acc))

    model.load_state_dict(best_model_wts)
    return model

#runs for 1 epoch just for demonstration purposes
print ("Got here")
model = build_model(model, 
                    criterion, 
                    optimizer, 
                    exp_lr_scheduler, 
                    num_epochs=1)

Got here
Epoch 0/0
----------



---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-143-05c4ef04f42d> in <module>
      1 #runs for 1 epoch just for demonstration purposes
      2 print ("Got here")
----> 3 model = build_model(model, 
      4                     criterion,
      5                     optimizer,


<ipython-input-142-fa210670951b> in build_model(model, criterion, optimizer, scheduler, num_epochs)
      7         print('-' * 10)
      8 
----> 9         phase_train(model, criterion, optimizer, scheduler)
     10         best_model_wts = phase_test(model, criterion, optimizer)
     11         #once we train on an epoch - we update our model wts


<ipython-input-138-9d206a49c1bd> in phase_train(model, criterion, optimizer, scheduler)
     16             outputs = model(inputs)
     17             _, preds = torch.max(outputs, 1)
---> 18             loss = criterion(outputs, labels)
     19 
     20             loss.backward()


~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),


~\anaconda3\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    945 
    946     def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 947         return F.cross_entropy(input, target, weight=self.weight,
    948                                ignore_index=self.ignore_index, reduction=self.reduction)
    949 


~\anaconda3\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2420     if size_average is not None or reduce is not None:
   2421         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2423 
   2424 


~\anaconda3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2216                          .format(input.size(0), target.size(0)))
   2217     if dim == 2:
-> 2218         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2219     elif dim == 4:
   2220         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)


RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_nll_loss_forward

with torch.no_grad():
    
    inputs, labels = iter(dataloaders['test']).next()
    inp = torchvision.utils.make_grid(inputs)
    
    outputs = model(inputs)
    _, preds = torch.max(outputs, 1)
    
    for j in range(len(inputs)):
        inp = inputs.data[j]
        img_show(inp, 'predicted:' + class_names[preds[j]])

Freezing a model’s layers

Here, we freeze all the layers in the network except the final layer.

We need to set requires_grad == False to freeze the parameters so that the gradients are not computed in backward().

frozen_model = models.resnet18(pretrained=True)
frozen_model.to(device)

for param in frozen_model.parameters():
    param.requires_grad = False

We add the final layer which is not frozen

frozen_model.fc = nn.Linear(num_ftrs, 5)

Add an optimizer - for only the final layer

There is no gradient calculation for the other layers, so we cannot use an optimizer for those

optimizer = optim.SGD(frozen_model.fc.parameters(), 
                      lr=0.001, 
                      momentum=0.9)

Create a scheduler for this model

Just like with the earlier model, we attach a scheduler to this optimizer in order to slow down the learning rate

exp_lr_scheduler = lr_scheduler.StepLR(optimizer, 
                                       step_size=7, 
                                       gamma=0.1)

The loss calculation is the same CrossEntropyLoss as earlier

This is just included here as a reminder as we are not changing the definition

criterion = nn.CrossEntropyLoss()

Reset the best_acc score

This is so that the new frozen model can start from scratch

best_acc = 0.0

Build the new model with the frozen layers

This is similar to what we did previously. Except this time, the gradient calculation is turned off for the parameters when a call is made to backward()

frozen_model = build_model(frozen_model, 
                           criterion, 
                           optimizer, 
                           exp_lr_scheduler, 
                           num_epochs=8)

We test a batch of flowers on this new model

with torch.no_grad():
    
    inputs, labels = iter(dataloaders['test']).next()
    inp = torchvision.utils.make_grid(inputs)
    
    outputs = frozen_model(inputs)
    _, preds = torch.max(outputs, 1)
    
    for j in range(len(inputs)):
        inp = inputs.data[j]
        img_show(inp, 'predicted:' + class_names[preds[j]])

ptrblck · October 29, 2020, 10:33am

Your code is quite unreadable at the moment, which makes debugging hard.
You can post code snippets by wrapping them into three backticks ```.

Based on the error message, the model output or more likely the target tensor passed to nn.CrossEntropyLoss or nn.NLLLoss is on the CPU while it should be on the GPU.

AnswersPls · October 29, 2020, 10:38pm

Sorry, I posted the thread out of frustration and didn’t really think about cleaning it up.

Does that look better now? All of the tensors that I can think of that are necessary have had the to(device) performed on them as is suggested in this article and to me anyway, thats pretty much my code looks.

https://towardsdatascience.com/pytorch-switching-to-the-gpu-a7c0b21e8a99

Any help is much appreciated!

AnswersPls · October 30, 2020, 1:00am

I fixed the issue and it’s running just fine now.

model.fc = nn.Linear(num_ftrs, 2).to(device)

I created this layer after i had already put the model onto the GPU in the code above it, and this layer defaults to the CPU so that just needed overidden