Algorithm doesn't run on GPU even after storing model and data into GPU. What am i missing?

Enes_Uguroglu · September 16, 2020, 11:20am

You can find training section of my code below:

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)

Note: I am receiving True when i check torch.cuda.is_available

After creating CNN model i wrote:

model = model.to(device)

Training Section:

import time
start_time = time.time()

epochs = 3

#Limits on numbers of batches if you want train faster(Not mandatory)
max_trn_batch = 800 # batch 10 image → 8000 images total
max_tst_batch = 300 # batch 10 image → 3000 images total

train_losses =
test_losses =
train_correct =
test_correct =

for i in range(epochs):

trn_corr = 0
tst_corr = 0

for b,(X_train,y_train) in enumerate(train_loader):
    X_train,y_train = X_train.to(device),y_train.to(device)
    
    #optinal limit number of batches
    if b == max_trn_batch:
        break
    b = b + 1
    
    y_pred = model(X_train)
    loss = criterion(y_pred,y_train)
    
    predicted = torch.max(y_pred.data,1)[1]
    batch_corr = (predicted == y_train).sum()
    trn_corr = trn_corr + batch_corr
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if b%200 == 0:
        print('Epoch:  {} Loss:  {} Accuracy:  {}'.format(i,loss,trn_corr.item()*100/(10*b)))

train_losses.append(loss)
train_correct.append(trn_corr)


#test set

with torch.no_grad():
    for b,(X_test,y_test) in enumerate(test_loader):
        X_test,y_test = X_test.to(device),y_test.to(device)
        
        #Optional
        if b==max_tst_batch:
            break
        y_val = model(X_test)
        predicted = torch.max(y_val.data,1)[1]
        batch_corr = (predicted == y_test).sum()
        tst_corr = tst_corr + batch_corr

loss = criterion(y_val,y_test)
test_losses.append(loss)
test_correct.append(tst_corr)

total_time = time.time() - start_time
print(f’Total Time: {total_time/60}) minutes’)

And during the training i am checking the CPU and GPU performance , CPU working %100 while GPU %1.

Note 1:Algorithm took 13 minutes when i use CPU as device, and took 7 min when i used GPU as device, so there seems tiny improvement, but i couldnt see any gpu utilization on task manager during training.

Note 2: Paremeters

ConvolutionalNetwork(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=46656, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=2, bias=True)

Thanks in advance

Henry_Chibueze · September 16, 2020, 11:39am

What are your pytorch and cuda versions?

Flock1 · September 16, 2020, 12:28pm

I think you’re not assigning your cuda device properly. Cuda device is usually assigned like this:
cuda:<device number> (usually, it’s 0). You can get the device number by running this command:
torch.cuda.current_device()

Enes_Uguroglu · September 16, 2020, 1:38pm

Pytorch version is 1.6
Cuda version is 10.2.89

Enes_Uguroglu · September 16, 2020, 1:38pm

I assigned it as you mentioned and observed no difference unfortunately.

Flock1 · September 16, 2020, 3:21pm

Try this:

Enes_Uguroglu · September 16, 2020, 8:12pm

Still can not be sure if i am really using GPU, i searched from internet and found a suggestion to use nvidia-smi to check utilization during training. GPU utilization is 1 percent. This is my full code btw:

github.com

euguroglu/Machine-Learning-Projects/blob/master/Cuda_Pytorch_CNN_Chest_X_ray_classification.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "device(type='cuda')"
      ]
     },

This file has been truncated. show original

ptrblck · September 18, 2020, 5:47am

You should get an error, if you try to push tensors or the model to the GPU and no GPU is available.
Also, once the data is transferred, you can check the device via print(X_test.device) and make sure it’s cuda:id.
Your GPU utilization might be that low as your model is really small, such that you would see the overhead of e.g. the kernel launches, data loading and processing etc.

Enes_Uguroglu · September 18, 2020, 9:10am

Thank you very much for the answer, i increased batch size to understand if thats the case and observed that utilization is increased from %1 to %3 sometimes. So it was really about size of my data.