Tutorial Training a classifier -- Traing on GPU

Hello everybody,

I’m following the tutorial about TRAINING ON GPU my neural network, and I want to push my model, and inputs/labels to the GPU but, I have a mistake, and I don’t understand why it crushes!!
We can see in the exercise that we can push to the GPU with theses command lines:

net.to(device)
inputs, labels = inputs.to(device), labels.to(device)

and with my personnal program, it crushes, as with the downloaded file!!

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight’”

Can you help me please ! :slight_smile:

Hi,
Yes sorry… I thought that we had a specific field to write the code, so that’s why I put pictures ! I will let the code as you say when I will be on my computer, if you want to help me :slight_smile:
Thanks

Hello,

So you can find my code here :

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import time

tmps1=time.time()

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def imshow(img):
    img = img / 2 + 0.5    
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

device = torch.device("cuda:0")
print(device)
 
 
  #Transfert du modèle (réseau de neurones)
if device:
    net.to(device)
    print("Transfert du réseau sur le GPU")
print("Transfert réseau sur GPU fait.")

for epoch in range(1):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):

        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches♠
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')


dataiter = iter(testloader)
images, labels = dataiter.next()


# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

print("debug1")
nb_cuda = torch.cuda.device_count()
print(nb_cuda)

print("Crash here")
outputs = net(images)
print("debug2")

_, predicted = torch.max(outputs, 1)
print("debug3")
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))
print("debug4")

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data


        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))
print("debug5")

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data

        outputs = net(images)

        _, predicted = torch.max(outputs, 1)

        c = (predicted == labels).squeeze()


        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

tmps2=time.time()-tmps1
print ("Temps d'execution = %f secondes" %tmps2)

As you can see, I check the progress of the program with differents print(“debug x”). Finally, as I have seen in the tutorial and differents forum, I have to push to the GPU, my model, and my inputs/labels.
When you run this code, the program stops just after print(“crash here”) with the error : RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 ‘weight’
So I understand the problem, why it says to me this answer, but I don’t understand what is the solution, because I push my model and my inputs/labels… :frowning:

I hope you can help me :slight_smile:
Thanks

I precise my config:

Windows 10 64 bits
GeForce GTX 1080 Ti
PyTorch, last version, installed 10/06/2018 (0.4.0)
CUDA 9.1

There is one mistake I noticed: after the training when you are passing images through your network, you should also transfer your images to the GPU because your net is defined on GPU. For instance it should be

images, labels = images.to(device), labels.to(device)

anytime before you use net(images).

Oh yes! I’m absolutely agree with you! I added images, labels = images.to(device), labels.to(device) before net(images) and the process runs to the end :slight_smile: and So I tested this code with GPU transfer and none, it’s ok but when I pick up the duration of the program, GPU : 100sec and CPU: 73sec, so firstly I think it must be the opposite and the gap more huge no ? Normaly, with GPU, the program takes fiew seconds in a perfect world :slight_smile:

I am not an expert on how pytorch works on GPU but my feeling is that it benefits the most when there are alot of convolutional layers with a lot of filters (this is because when you enable GPU there is the time to transfer stuff from CPU to GPU so if your computational gain is not too much it might be shadowed by the time loss for the transfer of information). Try changing, for instance, your network to one with larger convolutional layers.
Also try increasing your batch size, I have a feeling that pytorch will probably be able to harness the GPU more by doing large matrix multplications on it when the batch size is large. In google colaboratory you can experiment up to around 512 for the batchsize.

For instance:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv3 = nn.Conv2d(32, 64, 5)
        self.fc1 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = (F.relu(self.conv3(x)))
        x = x.view(-1, 64)
        x = self.fc1(x)

Here there is less linear layers but more convolutional layers with larger filters.

I’m testing the program with your model, and I modified differents parameters, as Batch-size, width, and epoch. When I run the code, on CPU = 500sec nearly, against 70sec on GPU, I think it’s the beginnning of the solution !! :slight_smile: If someone has differents advices, I’m ok to test these !

Even the one I have written is a very simple convolutional network. If you consider things like deep residual networks your speedup can be up to order of 100’s provided you have enough ram to fit everything. See for instance: https://github.com/iAvicenna/Residual-Network-Pytorch. I dont think it would even be sensible to try to run this on CPU.

Yes I tried to launch your program, and effectively my GPU works very well. But I have another, what is your system config, as you can see above, I have a good system and sometimes I think that it’s very slow! For example, when I run the script about the TRANSFER LEARNING tutorial, normaly, the program ends in 1min30 but me, it works during 5 min minimum!!! :confused: I don’t think that we can obtain a difference like this between a PyTorch Tutorial and my reality :confused:

https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

This is the link, and you can download the script at the end of the page :slight_smile:

Thanks a lot

Try it in colab.research.google.com, that is where I do must of my experimentation.

There are some points we could check to see why your script is slower than usual:

  • How did you install PyTorch? Via conda/pip or built from source?
  • Is your data locally stored on an SSD?
  • Is cuDNN enabled? (print(torch.backends.cudnn.enabled))
  • Could you monitor the power usage of your GPU in nvidia-smi? (I hope there is some equivalent for Windows) Maybe your GPU is overheating or your power supply is not sufficient.

Also, since you are using Windows, could you copy all the source code into a new script file and guard it with:

def main():
    # Code from tutorial


if __name__=='__main__':
    main()

This shouldn’t be a problem, because the code runs on your machine, but might be worth a try.

I installed Pytorch with conda/pip!
My data are stocked on my hard disk, basic ! It’s an 1To, not SSD. (my computer is HP Z-840
print(torch.backends.cudnn.enabled) = true !
And my code is also with def main() on my computer :slight_smile: so it’s not the problem…

OK, thanks for the info.
Could you try to time your data loading as shown in the ImageNet example?
Probably the data loading will take most of the time.

Hi, I am new to Pythorch. This question helped me in solving some doubts. For this reason I wanted to share my notebook which is a clean version of what discussed above. In addition, it automatically understands if you have or not a GPU. I am running everything in google colab and it works well.

You can find the associated notebook here:
https://colab.research.google.com/drive/1Mjk_zqal7YMgGkWzLnhr7tL9wFW_eCCc