I can't use gpu cuda tensor with NeuralNet

AlexRanda · December 26, 2023, 10:57am

Hi all,
I defined my module MLPNet and then used NeuralNet to implement grid search. However, I noticed that I couldn’t utilize my CUDA GPU tensor. I can’t figure out where the issue is. On the other hand, if I call the fit method using grid.fit(X_PM10_pt.cpu(), Y_PM10_pt.cpu()), it works, but it doesn’t utilize the GPU for grid search. So, how can I set everything to compute my grid search on my GPU using skorch and not on the CPU?

device = cuda
X_PM10_pt = pt.tensor(X_pm10,dtype=pt.float32, device=device) 
Y_PM10_pt = pt.tensor(Y_pm10,dtype=pt.float32, device=device)

import numpy as np
import torch as pt
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
from numpy import array
from skorch import NeuralNet
from sklearn.model_selection import GridSearchCV, PredefinedSplit

import torch.nn as nn
import torch.nn.init as init

class MLPNet(nn.Module):


    def __init__(self, dropout_rate=0.2, hidden_neurons=10, input_size=4, activation_fn= nn.ReLU(), output_size=1, weight_init=init.normal_):
        super(MLPNet, self).__init__()

        self.input_size = input_size 
        self.hidden_neurons = hidden_neurons
        self.activation_fn = activation_fn
        self.dropout_rate = dropout_rate
        self.output_size = output_size
        #self.weight_init = weight_init

        self.hidden_layer = nn.Linear(self.input_size, self.hidden_neurons)
        self.activation = self.activation_fn
        self.dropout = nn.Dropout(self.dropout_rate)
        self.output_layer = nn.Linear(self.hidden_neurons, self.output_size)

        """
        Inizializza i pesi con la strategia specificata.
        Se weight_init è None, utilizza init.normal_ come strategia di default.
        """
        for module in self.modules():
            if isinstance(module, nn.Linear):
                    weight_init(module.weight)
                    nn.init.constant_(module.bias, 0.1)

    def forward(self, x):
        x = self.activation(self.hidden_layer(x))
        x = self.output_layer(self.dropout(x))
        return x
    
    def check_initialization(self):
        for module in self.modules():
            if isinstance(module, nn.Linear):
                print(f"Weight initialization for layer {module}:")
                print(module.weight)
                print(f"Bias initialization for layer {module}:")
                print(module.bias)



mlp_model = NeuralNet(
    module= MLPNet,
    criterion=nn.MSELoss,
    optimizer=optim.Adam,
    device=device,
    verbose=True
)

param_grid = {
    'module__hidden_neurons': [10, 50, 100], #[100, 50, 10],
    'module__dropout_rate': [0.2, 0.5],
    'module__weight_init': [init.normal_,init.kaiming_normal_],
    'optimizer__lr': [0.01, 0.001],
    'batch_size':  [32, 64, 128],
    'max_epochs': [10, 50, 100] 
}


grid = GridSearchCV(estimator=mlp_model, param_grid=param_grid, cv=ps, scoring="neg_mean_squared_error", verbose=10, error_score='raise')

grid_result = grid.fit(X_PM10_pt, Y_PM10_pt)`

ptrblck · December 26, 2023, 1:07pm

How do you know the GPU isn’t used?

AlexRanda · December 26, 2023, 4:09pm

sorry, there is a typo in my post … I meant :
if I call the fit method using grid.fit(X_PM10_pt.cpu(), Y_PM10_pt.cpu()), it works.
Instead if I run grid.fit(X_PM10_pt.cuda(), Y_PM10_pt.cuda()) gives me the error:
TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

So, my understanding is that we can’t pass cuda tensor to use fit method of the gridsearch object…but it would be possible with the same fit method of another skorch object like NeuralNetRegressor , for example.
Another thing not clear to me is : After training via grid.fit(X_PM10_pt.cpu(), Y_PM10_pt.cpu())
and then I type mlp_model.initialized_ returns False, so I need to wirte best_mlp = mlp_model.set_params(**best_params) best_mlp.fit(X_PM10_train_pt, Y_PM10_train_pt) to get my object inizialized it… So my question is : By using skorch, my embedded NeuralNetRegressor seems not to be initialized at all and I need to directly re-fit the object

ptrblck · December 26, 2023, 5:06pm

I’m not familiar enough with skorch, but the issue might come from this higher level API. Which call fails exactly? Maybe you could fix it by moving the data back to the CPU before calling numpy() on it.

AlexRanda · December 26, 2023, 10:06pm

fit method of gridsearch() object fails if you give Cuda tensor as input, but if you first move it on CPU and then convert it in numpy and pass it to fit method, no error is given. That’s why I posted it here…because my understanding is if you want to tune hyper-parameters by using scikit-learn via skorch you can’t use cuda. I know there is another library for Pytorch to tune hyperparameters, but it seemed smoother using grid-search.