What is the best way to perform hyper parameter search in PyTorch?

^^What is the best way to perform hyper parameter search in PyTorch? Are there frameworks that can ease this process?


@kevinzakka has implemented hypersearch.
There are still some TODOs, so alternatively you could have a look at Skorch which allows you to use the scikit-learn grid search / random search.


An example:

class Net(torch.nn.Module):
    def __init__(self):
        A feedForward neural network.
            n_feature: How many of features in your data
            n_hidden:  How many of neurons in the hidden layer
            n_output:  How many of neuros in the output leyar (defaut=1)
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(D_in, H, bias=True)   # hidden layer
        self.predict = torch.nn.Linear(H, D_out, bias=True)   # output layer
        self.n_feature, self.n_hidden, self.n_output = D_in, H, D_out
    def forward(self, x,**kwargs):
            x: Features to predict
        x = torch.sigmoid(self.hidden(x))      # activation function for hidden layer
        x = torch.sigmoid(self.predict(x))     # linear output
        return x
from skorch import NeuralNetRegressor
net = NeuralNetRegressor(Net
                         , max_epochs=100
                         , lr=0.001
                         , verbose=1)
X_trf = X
y_trf = y.reshape(-1, 1)
from sklearn.model_selection import GridSearchCV

params = {
    'lr': [0.001,0.005, 0.01, 0.05, 0.1, 0.2, 0.3],
    'max_epochs': list(range(500,5500, 500))

gs = GridSearchCV(net, params, refit=False, scoring='r2', verbose=1, cv=10)

gs.fit(X_trf, y_trf)

Hi Ptrblck,

I hope you are doing well. Sorry to take your time. I want to do hyper parameter tuning for CNN layers ( 2 or 3 layers), number of filters for CNN, FC layers ( 2 or 3 layers) and number of neurons ([100:10:100]) , batch size {100,200}, LR {10^-4,10^-5}, Dropout{0.3,0.5,0.7}.

Would you please tell me what is your suggestion? Is there any function to use in Pytorch? Or it is better to do grid search for all combinations?

I am a bit skeptical of methods like grid and random search. It is nice to try them but I think experience is key in hyperparameter fine-tunning. These methods are not that good when your training takes 1 week and you do not have a server with 100’s of gpus.
For example, taking a better optimizer that converges faster is a cheaper and better way to optimize your training. Also, take for instances the batch size, a 32 batch size in a CNN will tend to perform better than a 4 or 8 batch size (at least in the dataset I am working on).
Experience plays a big role I guess.



In my opinion, you are 75% right, In the case of something like a CNN, you can scale down your model procedurally so it takes much less time to train, THEN do hyperparameter tuning. This paper found that a grid search to obtain the best accuracy possible, THEN scaling up the complexity of the model led to superior accuracy. Probably would not work for all cases, but definitely a good application for grid searches.

1 Like

I know it has been along time since this post.
It has been days since I tried to find a solution to find a better hyperparameters for my model. I tried many solutions but with no good results :frowning:
I am working with your proposed solution here with a classification bert model developed under the Pytorch lightning. I couldn’t find what to put in the fit method. i tried a tensor dataloader but It doesn’t work so I tried the result of the tokeniser like this :

input_ids_train = encoded_data_train['input_ids'] # Xs of my models
labels_train = torch.tensor(label.values) # the labels of my data

I got a TypeError: forward() missing 1 required positional argument: 'attention_mask' and a FitFailedWarning said that I need to reshape my data.

Could you please tell me how can I prepare my data (input_ids, attention_mask and labels) so they can fit the fit methods, please ?


How can we perform GridSearchCV with pretrained models in pytorch?


skorch is compatible with these scikit-learn methods so you might want to check it out.

1 Like

I will check it out. Thank you very much

Hi @ptrblck

I attempted applying GridSearch to optimize hyperparameters, but . Affter search.fit(train_ds, y=None)
and search.best_score_
I wanted to see best score. It demonstrate nan.

What am I doing wrong for applying GridSearch?

My codes is following

Thank you

data_dir = ‘/content/drive/MyDrive/raphcatr’
train_transforms = transforms.Compose([
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
val_transforms = transforms.Compose([
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])

train_ds = datasets.ImageFolder(
os.path.join(data_dir, ‘train’), train_transforms)
val_ds = datasets.ImageFolder(
os.path.join(data_dir, ‘val’), val_transforms)

class PretrainedModel(nn.Module):
def init(self, output_features):
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, output_features)
classifier = nn.Sequential(nn.Dropout(p=0.5),
nn.Linear(num_ftrs, output_features))

    self.model = model
def forward(self, x):
    return self.model(x)

params = {
‘lr’: [0.01, 0.02],
‘max_epochs’: [10, 20],

net = NeuralNetClassifier(
callbacks=[freezer, lrscheduler, checkpoint],
device=‘cuda’ # comment to train on cpu

gs = GridSearchCV(net, params, refit=False, cv=3, scoring=‘accuracy’, verbose=2)
gs.fit(train_ds, y=None)

Could you check, if you could see all scores and see if some of the runs created a NaN loss due to a bad hyperparameter set?
If so, you might want to select the highest score which is not a NaN. I’m not familiar with the internal implementation of skorch’ grid search but would assume that invalid runs (yielding NaNs) would be removed from the best score.

You could try raytune+pytorch-lighyning. As for me, that’s better than handle or sktorch.