GridSearchCV with pytorch tensor in cuda

nushib · March 23, 2020, 10:51pm

I am using GridSearchCV for cross validation as indicated here (https://github.com/skorch-dev/skorch). However it seems like I am not able to pass X and y as pytorch tensors loaded in cuda. This is my unsuccessful attempt:

device = “cuda”
X = torch.from_numpy(X).to(device)
y = torch.from_numpy(y).to(device)
gs.fit(X, y)

The error I get is “can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.”. This appears during gs.fit(X, y). Of course if I convert this to cpu() it works but that defeats the purpose because I’d like to avoid data transfers from memory to the gpu. Anything I am missing?

acarlier · March 23, 2020, 10:56pm

This means X and y already are PyTorch tensors, so you don’t need to import them from NumPy. Just remove the first 3 lines

nushib · March 24, 2020, 12:56am

Thanks! I also just read in the skorch documentation that fit() converts X and y to pytorch tensors. It however doesn’t say whether it transfers the data to the gpu. To make this more concrete, below is the code I am running. The problem appears on net.fit() as well. Here X, y are just numpy.ndarray-s. If used as such, they will be located in main memory and not in the gpu which is why I am importing, or am I wrong? Because of this, the following code runs 1.5 faster on cpu.

import numpy as np
from sklearn.datasets import make_classification
from torch import nn
import torch.nn.functional as F
import torch
from skorch import NeuralNetClassifier
from datetime import datetime, time as datetime_time, timedelta
from dateutil.relativedelta import relativedelta
from sklearn.model_selection import GridSearchCV


def diff(t_a, t_b):
    t_diff = relativedelta(t_b, t_a)  # later/end time comes first!
    return '{h}h {m}m {s}s'.format(h=t_diff.hours, m=t_diff.minutes, s=t_diff.seconds)

class MyModule(nn.Module):
    def __init__(self, num_units=10, nonlin=F.relu):
        super(MyModule, self).__init__()

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dropout = nn.Dropout(0.5)
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 2)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = self.dropout(X)
        X = F.relu(self.dense1(X))
        X = F.softmax(self.output(X), dim=-1)
        return X

device = "cuda"
net = NeuralNetClassifier(
    MyModule,
    max_epochs=100,
    lr=0.1, 
    # Shuffle training data on each epoch
    iterator_train__shuffle=True,device=device
)

X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X = X.astype(np.float32)
y = y.astype(np.int64)

X =  torch.from_numpy(X).to(device)
y = torch.from_numpy(y).to(device)

t_a = datetime.now()
net.fit(X, y)
y_proba = net.predict_proba(X)
t_b = datetime.now()
print(diff(t_a, t_b))