Gpu is almost not being used while training but data and model are on device

Jaredeco · June 24, 2020, 9:35am

My GPU utilization is about 1% while training when I work with an image dataset passed to DataLoader, increasing batch size and num_workers does not help, however when I work with csv data and I do not preprocess it with DataLoader(I pass the whole dataset through model not using batches) it uses GPU and everything works fine but it works only if I make Variable from tensor when I try to put tensor on device(tensor.to(device)) nothing happens and it runs on Cpu. Thanks for help, I have been trying to fix this for a long time.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms
from torchvision import datasets
train_path = r'D:\Datasets\fruits\5857_1166105_bundle_archive\fruits-360\Training'
transform =  transforms.Compose([
    transforms.Resize((32,32)),
    transforms.ToTensor(),
])
train_data = datasets.ImageFolder(train_path, transform=transform)
train_loader = torch.utils.data.DataLoader(
    train_data,
    batch_size=200,
    num_workers=4,
    shuffle=True,
)
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16*6*6, 64)
        self.fc2 = nn.Linear(64, 131)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*6*6)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

device = torch.device("cuda")
net = Model().cuda()
import torch.optim as optim
loss_fn = nn.CrossEntropyLoss()
opt = optim.Adam(net.parameters())
from torch.autograd import Variable
for epoch in range(10):
    for i, data in enumerate(train_loader):
        inputs, labels = data[0], data[1]
        inputs = Variable(inputs).cuda()
        labels = Variable(labels).cuda()
        torch.no_grad()
        out = net(inputs)
        loss = loss_fn(out, labels)
        loss.backward()
        opt.step()
    print(loss.item())

TheDoctor · June 24, 2020, 2:50pm

What is the resolution of your images? Is it in jpg format? I noticed with my CPU R5 2600X that the cpu was already at its limit decoding the image so i lowered the resolution of the image to 1000x1000 with a external Image Compressor and used the compressed images for the dataset. I aussme your CPU runs jpg decoding for most of the time.

Jaredeco · June 24, 2020, 4:23pm

They are resized to 32*32 pixels, that did not help but i have problem with putting it on gpu.

Nikronic · June 24, 2020, 6:50pm

Hi,

Try to send opt to cuda using .cuda()

Also, Variable has been removed from PyTorch for few last versions. You can directly send tensors to GPU using inputs = inputs.cuda().

Also, could you close the other post as it is same question at the same time?
https://discuss.pytorch.org/t/pytorch-using-1-of-gpu-the-training-is-slow-almost-like-on-cpu/86757

Jaredeco · June 25, 2020, 5:55am

Hello, thank you very much for your help but when I try to send the optimizer to gpu(.cuda()) I get an error

AttributeError                            Traceback (most recent call last)
<ipython-input-10-e7bad7cb8ff5> in <module>
      1 import torch.optim as optim
      2 loss_fn = nn.CrossEntropyLoss()
----> 3 opt = optim.Adam(net.parameters()).cuda()

AttributeError: 'Adam' object has no attribute 'cuda'

import torch.optim as optim loss_fn = nn.CrossEntropyLoss() opt = optim.Adam(net.parameters()).cuda()

harsha_g · June 25, 2020, 5:56am

Remove .cuda() after the optim.Adam() call. I did not know you could send the optimizer to any device. It’s usually only reserved for model or tensor. Happy to be told otherwise.

Jaredeco · June 25, 2020, 6:03am

Hello thank you for your suggestion, but I was told to do try to put optimizer to Gpu(opt.cuda()) because my code higher in this conversation is not almost using the Gpu. Do you have any suggestions please because I can not make this work for a really long time.

Nikronic · June 25, 2020, 6:50am

Ow, sorry, I do not know why I said optim.cuda() so strictly! thanks @harsha_g for pointing out to. I meant loss functions but still it won’t make any differences because there is no parameter in cross entropy.
Also, did you try to remove Variable from your code and sending in/out to GPU using .cuda()?

Sending model and input/output tensors to cuda is enough. Can you check that after sending input/output to cuda, they are really on cuda? .device will help.

See this thread, it has a lot of good responses to different issues. [SOLVED] Make Sure That Pytorch Using GPU To Compute

Nikronic · June 25, 2020, 6:54am

Would you mind share a few examples of your data set and its structure? So, I can run few experiments.

harsha_g · June 25, 2020, 6:55am

If I am not wrong

Jaredeco · June 25, 2020, 7:25am

i have tried everything removing Variable using .to(device) I read all the posts about this and none of them helped. But when I work with csv dataset and i do not use DataLoader it uses Gpu at about 60% like in this example

import torch 
import torch.nn as nn
import numpy as np
import pandas as pd
from torch.utils.data import DataLoader
from torch.autograd import Variable
import torch.nn.functional as F
from sklearn import preprocessing

nor = preprocessing.MinMaxScaler()
train_path = r"D:\Datasets\house prices\train.csv"
test_path = r"D:\Datasets\house prices\test.csv"
train_file = pd.read_csv(train_path)
test_file = pd.read_csv(test_path)
train_file.drop("Id", axis=1, inplace=True)
test_file.drop("Id", axis=1, inplace=True)
train_file.fillna(train_file.median(), inplace=True)
test_file.fillna(train_file.median(), inplace=True)
y_train = train_file["SalePrice"].values
y_train.resize(1460, 1)
num_cols = list(train_file._get_numeric_data().columns)
num_cols.remove("SalePrice")
train_file = pd.get_dummies(train_file, drop_first=True, dummy_na=True)
test_file = pd.get_dummies(test_file, drop_first=True, dummy_na=True)
train_file = nor.fit_transform(train_file)
test_file = nor.fit_transform(test_file)
X_train = train_file
x1 = pd.DataFrame(train_file)
x2 = pd.DataFrame(test_file)
x2 = x2.align(x1, axis=1)[0]
X_test = x2.values

X_train = Variable(torch.from_numpy(X_train).float())
y_train = Variable(torch.from_numpy(y_train).float())
X_test = Variable(torch.from_numpy(X_test).float())

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(289, 128)
        self.fc2 = nn.Linear(128, 128)
        self.fc3 = nn.Linear(128, 128)
        self.fc4 = nn.Linear(128, 128)
        self.fc5 = nn.Linear(128, 1)
        self.drop = nn.Dropout(0.2)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.drop(F.relu(self.fc2(x)))
        x = self.drop(F.relu(self.fc3(x)))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = Model().cuda()

loss_fn = nn.MSELoss()
opt = torch.optim.Adam(net.parameters())

epochs = 10000
train_loss = 0
for epoch in range(epochs):
    preds = net(X_train.cuda())
    opt.zero_grad()
    loss = loss_fn(preds, y_train.cuda())
    loss.backward()
    opt.step()
    train_loss += loss.item()
    if epoch % 100 == 0:
        print(f"Loss: {loss.item()}")

this works with no problems, I think that there might be some problem with DataLoder when passing images to model. I really appreciate your help and time thank you so much

harsha_g · June 25, 2020, 7:32am

I just ran the code with the data and I find no reason to believe there’s anything wrong with the code. I ran it for 100 epochs with a batch size of 2500 and I can see a peak of 60% on my GPU (Tesla P100). Each epoch at most takes 5 seconds.

Jaredeco · June 25, 2020, 7:39am

when i tried to run it with 2500 batch size, it did not use gpu at all the only thing that it used was ram. I have Gpu gtx1050 ti and Tensorflow runs no problem and I did not even get to 1 epoch in 5 mins. I do not what might be the problem this is like nightmare nothings works to solve that.

harsha_g · June 25, 2020, 7:41am

Are you saying that even after 5 minutes, you did not finish at least 1 epoch?

Jaredeco · June 25, 2020, 7:42am

yes it is so slow I do not know why I will try to run it in colab and see if the problem is in my pc.

Jaredeco · June 25, 2020, 7:54am

Now i tried it with Cifar10 dataset and it used gpu only sometimes and max 20% it was very slow

Jaredeco · June 25, 2020, 8:37am

I reinstalled torch and now cifar10 from pytorch datasets uses gpu 50% but no improvements in that fruit dataset but i think that dataset is not a problem because tensorflow model worked fine with this fruit dataset.

Jaredeco · June 25, 2020, 2:47pm

Please can anyone help with this I have been stuck making this cnn for a long time.

Jaredeco · June 25, 2020, 3:39pm

I have just found out that there is no problem in model or cuda but there must be a problem in Dataset or DataLoader because Cifar10 from pytorch runs no problem. But I think that dataset is not a problem because there is a lot of models on GitHub with fruits datasets and it also does not work with dogs vs cats dataset so I think that there is some problem with custom datasets and daloader. Do you have any suggestions please. Thanks for your help.

harsha_g · June 25, 2020, 3:49pm

@Jaredeco Did you try playing with the num_workers argument Guidelines for assigning num_workers to DataLoader

Other than that, I don’t really have much to add. I hope your issue is resolved quickly .