Gpu is almost not being used while training but data and model are on device

My GPU utilization is about 1% while training when I work with an image dataset passed to DataLoader, increasing batch size and num_workers does not help, however when I work with csv data and I do not preprocess it with DataLoader(I pass the whole dataset through model not using batches) it uses GPU and everything works fine but it works only if I make Variable from tensor when I try to put tensor on device(tensor.to(device)) nothing happens and it runs on Cpu. Thanks for help, I have been trying to fix this for a long time.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms
from torchvision import datasets
train_path = r'D:\Datasets\fruits\5857_1166105_bundle_archive\fruits-360\Training'
transform =  transforms.Compose([
    transforms.Resize((32,32)),
    transforms.ToTensor(),
])
train_data = datasets.ImageFolder(train_path, transform=transform)
train_loader = torch.utils.data.DataLoader(
    train_data,
    batch_size=200,
    num_workers=4,
    shuffle=True,
)
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16*6*6, 64)
        self.fc2 = nn.Linear(64, 131)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*6*6)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

device = torch.device("cuda")
net = Model().cuda()
import torch.optim as optim
loss_fn = nn.CrossEntropyLoss()
opt = optim.Adam(net.parameters())
from torch.autograd import Variable
for epoch in range(10):
    for i, data in enumerate(train_loader):
        inputs, labels = data[0], data[1]
        inputs = Variable(inputs).cuda()
        labels = Variable(labels).cuda()
        torch.no_grad()
        out = net(inputs)
        loss = loss_fn(out, labels)
        loss.backward()
        opt.step()
    print(loss.item())

What is the resolution of your images? Is it in jpg format? I noticed with my CPU R5 2600X that the cpu was already at its limit decoding the image so i lowered the resolution of the image to 1000x1000 with a external Image Compressor and used the compressed images for the dataset. I aussme your CPU runs jpg decoding for most of the time.

They are resized to 32*32 pixels, that did not help but i have problem with putting it on gpu.

Hi,

Try to send opt to cuda using .cuda()

Also, Variable has been removed from PyTorch for few last versions. You can directly send tensors to GPU using inputs = inputs.cuda().

Also, could you close the other post as it is same question at the same time?
https://discuss.pytorch.org/t/pytorch-using-1-of-gpu-the-training-is-slow-almost-like-on-cpu/86757

1 Like

Hello, thank you very much for your help but when I try to send the optimizer to gpu(.cuda()) I get an error

AttributeError                            Traceback (most recent call last)
<ipython-input-10-e7bad7cb8ff5> in <module>
      1 import torch.optim as optim
      2 loss_fn = nn.CrossEntropyLoss()
----> 3 opt = optim.Adam(net.parameters()).cuda()

AttributeError: 'Adam' object has no attribute 'cuda'

import torch.optim as optim loss_fn = nn.CrossEntropyLoss() opt = optim.Adam(net.parameters()).cuda()

Remove .cuda() after the optim.Adam() call. I did not know you could send the optimizer to any device. It’s usually only reserved for model or tensor. Happy to be told otherwise.

2 Likes

Hello thank you for your suggestion, but I was told to do try to put optimizer to Gpu(opt.cuda()) because my code higher in this conversation is not almost using the Gpu. Do you have any suggestions please because I can not make this work for a really long time.

Ow, sorry, I do not know why I said optim.cuda() so strictly! thanks @harsha_g for pointing out to. I meant loss functions but still it won’t make any differences because there is no parameter in cross entropy.
Also, did you try to remove Variable from your code and sending in/out to GPU using .cuda()?

Sending model and input/output tensors to cuda is enough. Can you check that after sending input/output to cuda, they are really on cuda? .device will help.

See this thread, it has a lot of good responses to different issues. [SOLVED] Make Sure That Pytorch Using GPU To Compute

1 Like

Would you mind share a few examples of your data set and its structure? So, I can run few experiments.

If I am not wrong :wink:

1 Like

i have tried everything removing Variable using .to(device) I read all the posts about this and none of them helped. But when I work with csv dataset and i do not use DataLoader it uses Gpu at about 60% like in this example

import torch 
import torch.nn as nn
import numpy as np
import pandas as pd
from torch.utils.data import DataLoader
from torch.autograd import Variable
import torch.nn.functional as F
from sklearn import preprocessing

nor = preprocessing.MinMaxScaler()
train_path = r"D:\Datasets\house prices\train.csv"
test_path = r"D:\Datasets\house prices\test.csv"
train_file = pd.read_csv(train_path)
test_file = pd.read_csv(test_path)
train_file.drop("Id", axis=1, inplace=True)
test_file.drop("Id", axis=1, inplace=True)
train_file.fillna(train_file.median(), inplace=True)
test_file.fillna(train_file.median(), inplace=True)
y_train = train_file["SalePrice"].values
y_train.resize(1460, 1)
num_cols = list(train_file._get_numeric_data().columns)
num_cols.remove("SalePrice")
train_file = pd.get_dummies(train_file, drop_first=True, dummy_na=True)
test_file = pd.get_dummies(test_file, drop_first=True, dummy_na=True)
train_file = nor.fit_transform(train_file)
test_file = nor.fit_transform(test_file)
X_train = train_file
x1 = pd.DataFrame(train_file)
x2 = pd.DataFrame(test_file)
x2 = x2.align(x1, axis=1)[0]
X_test = x2.values

X_train = Variable(torch.from_numpy(X_train).float())
y_train = Variable(torch.from_numpy(y_train).float())
X_test = Variable(torch.from_numpy(X_test).float())

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(289, 128)
        self.fc2 = nn.Linear(128, 128)
        self.fc3 = nn.Linear(128, 128)
        self.fc4 = nn.Linear(128, 128)
        self.fc5 = nn.Linear(128, 1)
        self.drop = nn.Dropout(0.2)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.drop(F.relu(self.fc2(x)))
        x = self.drop(F.relu(self.fc3(x)))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = Model().cuda()

loss_fn = nn.MSELoss()
opt = torch.optim.Adam(net.parameters())

epochs = 10000
train_loss = 0
for epoch in range(epochs):
    preds = net(X_train.cuda())
    opt.zero_grad()
    loss = loss_fn(preds, y_train.cuda())
    loss.backward()
    opt.step()
    train_loss += loss.item()
    if epoch % 100 == 0:
        print(f"Loss: {loss.item()}")

this works with no problems, I think that there might be some problem with DataLoder when passing images to model. I really appreciate your help and time thank you so much

I just ran the code with the data and I find no reason to believe there’s anything wrong with the code. I ran it for 100 epochs with a batch size of 2500 and I can see a peak of 60% on my GPU (Tesla P100). Each epoch at most takes 5 seconds.

when i tried to run it with 2500 batch size, it did not use gpu at all the only thing that it used was ram. I have Gpu gtx1050 ti and Tensorflow runs no problem and I did not even get to 1 epoch in 5 mins. I do not what might be the problem this is like nightmare nothings works to solve that.

Are you saying that even after 5 minutes, you did not finish at least 1 epoch?

yes it is so slow I do not know why I will try to run it in colab and see if the problem is in my pc.

Now i tried it with Cifar10 dataset and it used gpu only sometimes and max 20% it was very slow

I reinstalled torch and now cifar10 from pytorch datasets uses gpu 50% but no improvements in that fruit dataset but i think that dataset is not a problem because tensorflow model worked fine with this fruit dataset.

Please can anyone help with this I have been stuck making this cnn for a long time.

I have just found out that there is no problem in model or cuda but there must be a problem in Dataset or DataLoader because Cifar10 from pytorch runs no problem. But I think that dataset is not a problem because there is a lot of models on GitHub with fruits datasets and it also does not work with dogs vs cats dataset so I think that there is some problem with custom datasets and daloader. Do you have any suggestions please. Thanks for your help.

@Jaredeco Did you try playing with the num_workers argument :point_right: Guidelines for assigning num_workers to DataLoader

Other than that, I don’t really have much to add. I hope your issue is resolved quickly :+1:.